Closed Bug 817342 Opened 12 years ago Closed 11 years ago

Crash [@ js::GCMarker::processMarkStackTop(js::SliceBudget&) ] with Redirector Extension

Categories

(Core :: DOM: Core & HTML, defect)

defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla21
Tracking Status
firefox19 --- unaffected
firefox20 + fixed
firefox21 --- fixed
firefox-esr17 --- unaffected
b2g18 --- unaffected

People

(Reporter: fehe, Assigned: smaug)

References

(Blocks 1 open bug)

Details

(4 keywords, Whiteboard: [js:p1][adv-main20-])

Crash Data

Attachments

(4 files)

Attached file stack trace —
As a result of bug 790338, there is a new crash that can be exposed when the Print Edit and Redirector extensions are present and Print Edit is invoked.

bp-da0b6941-aaee-4977-aa90-fe7372121201

Regression window: http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=7e6ac3bfbae9&tochange=26172ff887ae

STR
1. Launch Firefox with a new profile
2. Install the following extensions and restart Firefox:
a) Print Edit: https://addons.mozilla.org/en-US/firefox/addon/print-edit/
b) Redirector: https://addons.mozilla.org/en-US/firefox/addon/redirector/
3. Click the home button to load the Firefox default home page
4. Right-click the page and choose Print... --> Print Edit
5. Wait a few seconds


Crashing Thread
Frame 	Module 	Signature 	Source
0 	mozjs.dll 	js::GCMarker::processMarkStackTop 	js/src/gc/Marking.cpp:1348
1 	mozjs.dll 	js::GCMarker::drainMarkStack 	js/src/gc/Marking.cpp:1407
2 	mozjs.dll 	MarkGrayReferences 	js/src/jsgc.cpp:2743
3 	mozjs.dll 	EndMarkingCompartmentGroup 	js/src/jsgc.cpp:3214
4 	mozjs.dll 	SweepPhase 	js/src/jsgc.cpp:3441
5 	mozjs.dll 	IncrementalCollectSlice 	js/src/jsgc.cpp:3879
6 	mozjs.dll 	GCCycle 	js/src/jsgc.cpp:4006
7 	mozjs.dll 	Collect 	js/src/jsgc.cpp:4121
8 	mozjs.dll 	js::GCFinalSlice 	js/src/jsgc.cpp:4166
9 	xul.dll 	CCTimerFired 	dom/base/nsJSEnvironment.cpp:3256
10 	xul.dll 	nsTimerEvent::Run 	xpcom/threads/nsTimerImpl.cpp:565
11 	xul.dll 	nsThread::ProcessNextEvent 	xpcom/threads/nsThread.cpp:627
12 	xul.dll 	mozilla::ipc::MessagePump::Run 	ipc/glue/MessagePump.cpp:82
13 	xul.dll 	MessageLoop::RunHandler 	ipc/chromium/src/base/message_loop.cc:208
14 	xul.dll 	MessageLoop::Run 	ipc/chromium/src/base/message_loop.cc:182
15 	xul.dll 	nsBaseAppShell::Run 	widget/xpwidgets/nsBaseAppShell.cpp:163
16 	xul.dll 	nsAppShell::Run 	widget/windows/nsAppShell.cpp:229
17 	xul.dll 	nsAppStartup::Run 	toolkit/components/startup/nsAppStartup.cpp:290
18 	xul.dll 	XREMain::XRE_mainRun 	toolkit/xre/nsAppRunner.cpp:3823
19 	xul.dll 	XREMain::XRE_main 	toolkit/xre/nsAppRunner.cpp:3890
20 	xul.dll 	XRE_main 	toolkit/xre/nsAppRunner.cpp:4084
21 	firefox.exe 	wmain 	toolkit/xre/nsWindowsWMain.cpp:105
22 	firefox.exe 	__tmainCRTStartup 	crtexe.c:552
23 	kernel32.dll 	BaseThreadInitThunk 	
24 	ntdll.dll 	__RtlUserThreadStart 	
25 	ntdll.dll 	_RtlUserThreadStart
Blocks: 790338
Crash Signature: [@ js::GCMarker::processMarkStackTop(js::SliceBudget&) ]
Keywords: crash, regression
Excellent! I'm able to reproduce on Windows. Setting OS to "Windows 7" even though it also happens on XP.
OS: All → Windows 7
Blocks: 719114
Keywords: reproducible
Version: Trunk → 20 Branch
Assignee: general → wmccloskey
Whiteboard: [js:p1]
I could have sworn I tried this earlier in a debug build, but I guess I forgot. Anyway, this trips a compartment assertion in a Linux debug build for me. It seems to have something to do with the document's JS wrapper not having the same global as the window. It also nearly trips this assertion in nsDocument::SetScriptGlobalObject:

              NS_ASSERTION(JS_GetGlobalForObject(cx, obj) == newScope,
                           "Wrong scope, this is really bad!");

However, the call to JS_GetGlobalForObject dies first because cx->compartment != obj->compartment().

Could a DOM person please look into this?
Assignee: wmccloskey → nobody
Group: core-security
Component: JavaScript Engine → DOM
OS: Windows 7 → All
Version: 20 Branch → Trunk
Attached file stack trace of assertion —
Here's a stack trace of the assertion. It might have something to do with printing.
So in this case we have:

1)  The document has a wrapper.
2)  The JSContext of the new global is not same-compartment with that wrapper.
3)  In either case, the global of the document's wrapper is not the new global.

Presumably the context compartment either matches the new global or matches whatever script is running on that context...
This could potentially be due to bugs in the addons themselves, maybe moving objects between compartments when they shouldn't.
Keywords: sec-moderate
Whiteboard: [js:p1] → [js:p1] could be bug(s) in the addons?
(In reply to Daniel Veditz [:dveditz] from comment #5)
> This could potentially be due to bugs in the addons themselves, maybe moving
> objects between compartments when they shouldn't.

These addons don't have any binary components. And stuff written in JS is supposed to be safe against compartment mismatches. So I think it's our bug.
Whiteboard: [js:p1] could be bug(s) in the addons? → [js:p1]
If nobody else is looking at this, I can try to take a look.
Go for it. It would be great to get this fixed.
Assignee: nobody → continuation
In addition to Print Edit, doing Print Preview or even just Print seem to cause the same problem.

The Print Edit addon doesn't even seem to be needed. With just Redirector enabled, using the standard "print" command on the default homepage causes the same problem. Disabling the addon causes the problem to go away.
Also, the STR for the crash indicates that it takes a few seconds (suggesting that it needs to wait for a GC to run), but the assertion happens immediately. It might be useful to see if this trips the assert before the above regression range where the crash starts happening.
(In reply to Andrew McCreight [:mccr8] from comment #10)
> Also, the STR for the crash indicates that it takes a few seconds
> (suggesting that it needs to wait for a GC to run), but the assertion
> happens immediately. It might be useful to see if this trips the assert
> before the above regression range where the crash starts happening.

I'm reasonably certain that this has been a problem forever. If that's not true, please assign it back to me.
Olli, could you take a look at this? It looks like it is a problem in printing code, and you are more familiar with it than me.
Summary: Crash [@ js::GCMarker::processMarkStackTop(js::SliceBudget&) ] with Print Edit and Redirector Extensions → Crash [@ js::GCMarker::processMarkStackTop(js::SliceBudget&) ] with Redirector Extension
Assignee: continuation → bugs
Will look at this next week. Sorry about the delay.
Ghostery also seems to cause the same problem.
Yup, bug 825380 is filed on that (but is probably a dupe of this).
So we end up running script runners for <link> for example and content policy touches document which
doesn't have script global object yet. Printing is pretty unique here, because of its cloning.
Patch coming.
Attached patch -w — — Splinter Review
Attached patch patch — — Splinter Review
Make sure we don't run things which might cause content policy checks etc
before the documents have their script globals set.

It would be nice to have script blocker for longer time, but unfortunately
printing related dialogs require scripts to run.
Though, it is only chrome scripts which can access documents, so running 
scripts isn't that bad.
Attachment #700022 - Flags: review?(roc)
Comment on attachment 700022 [details] [diff] [review]
patch

[Security approval request comment]
How easily could an exploit be constructed based on the patch?
Not easy, and as far as I see it requires some chrome js (like addons).

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?
Comment will be about delaying script runners in order to run them all at the same time.
Kind of vague. Nothing about compartments though.

Which older supported branches are affected by this flaw?
All

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?
The same patch should applies everywhere

How likely is this patch to cause regressions; how much testing does it need?
Changes to printing are always a bit regression risky since we don't have good tests (because for many
things we just don't have the testing framework which could handle such tests).
Attachment #700022 - Flags: sec-approval?
Comment on attachment 700022 [details] [diff] [review]
patch

Per https://wiki.mozilla.org/Security/Bug_Approval_Process this shouldn't
need approval.

I'll land tomorrow.
Attachment #700022 - Flags: review+
Comment on attachment 700022 [details] [diff] [review]
patch

Olli cleared the wrong flag. Carrying forward roc's r+.
Attachment #700022 - Flags: sec-approval? → review+
(In reply to Olli Pettay [:smaug] from comment #21)
> Comment on attachment 700022 [details] [diff] [review]
> patch
> 
> Per https://wiki.mozilla.org/Security/Bug_Approval_Process this shouldn't
> need approval.

That's right.
https://crash-stats.mozilla.com/report/list?signature=js%3A%3AGCMarker%3A%3AprocessMarkStackTop%28js%3A%3ASliceBudget%26%29 is a top crash across all our somewhat recent versions, what amount of those crashes do you think this patch should fix? If a major part, we should consider to uplift this to as many versions as possible.
It is difficult to say, as that is a generic stack we hit any time there is any kind of GC-related problem. This patch only affects people trying to do print-preview with certain addons (Redirector, Ghostery, probably others).

In Nightly, my theory is that these errors are bucketed somewhere else due to additional runtime checks (js::CompartmentChecker::fail), so if we look at the Nightly 20 numbers, we see that 1.37% of crashes have the compartment checker assertion, and 1.14% have the processMarkStackTop stack.

If we look at Aurora 19, we see that 3.19% of crashes are processMarkStackTop, which happens to be in the ballpark of 1.37+1.14. Thus, very optimistically, we could say we might get rid of half of them.

On the other hand, I would guess these kinds of addons are much rarer beyond Nightly and Aurora, and thus this patch may have little overall effect. It may also be the case that this problem doesn't result in crashes all the time without the assertion, so we are overestimating the % that seem print preview related, and thus landing this fix will not help nearly so much.

Another way to get an estimate would be to see what % of processMarkStackTop crashes have Ghostery or this Redirector thing, compared to the over all number. If the corrrelation is high, that suggests a decent number could also be print preview related, and thus fixed by this bug.
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #25)
> If a major part, we should consider to uplift this to as many versions as possible.
This bug is a regression in Firefox 20 (see comment 0 and tracking flags) so it should be uplifted only in Aurora.
https://hg.mozilla.org/mozilla-central/rev/5a74a94f6d44
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
I think we could uplift the patch, but I'd like to wait for few days to get at least tiny
amount testing.
Nominating for tracking as a speculative partial fix for a top crash.
(In reply to Scoobidiver from comment #27)
> (In reply to Robert Kaiser (:kairo@mozilla.com) from comment #25)
> > If a major part, we should consider to uplift this to as many versions as possible.
> This bug is a regression in Firefox 20 (see comment 0 and tracking flags) so
> it should be uplifted only in Aurora.

That's not exactly true. This particular test case only reproduces in FF20. However, as far as I know the bug has been present since long before that. It just didn't reproduce in this circumstance until bug 790338. I think we should uplift everywhere we can, although it's up to Olli to evaluate the risk of that.
This needs some testing. Manual testing I'm afraid.
Unfortunately printing is so rarely used feature nowadays that regression bugs may be filed
months after fixing the bug.
(In reply to Olli Pettay [:smaug] from comment #32)
> This needs some testing. Manual testing I'm afraid.
> Unfortunately printing is so rarely used feature nowadays that regression
> bugs may be filed
> months after fixing the bug.

Adding qawanted to fulfil this request. I've talked with Olli via IRC about this and we agreed to getting some broad, generic print testing using the Softvision team. I'll coordinate this testing and try to report back any results in the next week.
Keywords: qawanted
QA Contact: anthony.s.hughes
The print testing has been completed and is currently stored on a protected Google Doc. Olli, I'm working on getting the doc shared with you so you can view the raw results. If you have no questions or concerns with the results I will try to post a distilled version in this and related bugs.
So I didn't see any regressions based on that doc. Anthony, could you verify (once you're back).
Comment on attachment 700022 [details] [diff] [review]
patch

[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 487667 and compartments
User impact if declined: Crashes
Testing completed (on m-c, etc.): Landed m-c 2013-01-10 and see comment 33 and comment 35
Risk to taking this patch (and alternatives if risky): Tiny bit risky, since
script runners end up running later
String or UUID changes made by this patch: NA
Attachment #700022 - Flags: approval-mozilla-aurora?
Release mode assertions that this patch fixes are something like 3% of all crashes on Aurora.  Every time somebody with one of a wide variety of addons tries to do print preview, they hit this assertion.  The assertion is disabled in beta and later, but at least some of these will turn into real crashes.
Comment on attachment 700022 [details] [diff] [review]
patch

Willing to take the risk on Aurora since this might help with some topcrashers and qa should continue to verify as well as watch for regressions since we'll have time to backout if the payoff isn't what we hoped here.
Attachment #700022 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Depends on: 836875
(In reply to Olli Pettay [:smaug] from comment #37)
> So I didn't see any regressions based on that doc. Anthony, could you verify
> (once you're back).

I confirm that the testing found no regressions.
Keywords: qawanted
Target Milestone: --- → mozilla21
Whiteboard: [js:p1] → [js:p1][adv-main20-]
Group: core-security
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: