Closed Bug 1208622 Opened 4 years ago Closed 4 years ago

Crash in M-e10s-1 on linux64 with APZ enabled due to IsCallerChrome() being called without AutoJSAPI on the stack

Categories

(Core :: General, defect)

x86_64
Linux
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla44
Tracking Status
e10s + ---
firefox44 --- fixed

People

(Reporter: kats, Assigned: bholley)

References

Details

Attachments

(2 files, 1 obsolete file)

Attached file Crash stack
I landed bug 1143856 and it was backed out due to a M-e10s-1 failure on Linux64 asan and Linux64 debug. Looking at the logs for those failures (e.g. [1]) there is a crash stack (attached). The root cause appears to be that IsCallerChrome() is called without having an AutoJSAPI on the stack. bholley said he would write a patch for it [2].


[1] http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-linux64-debug/1443203735/mozilla-inbound_ubuntu64_vm-debug_test-mochitest-e10s-1-bm121-tests1-linux64-build198.txt.gz
[2] https://bugzilla.mozilla.org/show_bug.cgi?id=1143856#c12
The specific test that is failing is dom/events/test/test_bug574663.html and I presume this would be reproducible locally using:

./mach mochitest -f plain --e10s --setpref layers.async-pan-zoom.enabled=true dom/events/test/test_bug574663.html

on an appropriate linux64 system/build.
Attached patch Separate API entry points. v1 (obsolete) — Splinter Review
kats, can you let me know if this fixes the crash for you? I don't have a
linux build handy.
Attachment #8666232 - Flags: review?(bzbarsky)
Attachment #8666232 - Flags: feedback?(bugmail.mozilla)
Comment on attachment 8666232 [details] [diff] [review]
Separate API entry points. v1

Review of attachment 8666232 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for the quick patch! This fixes the issue for me locally. I also did a try push with APZ enabled to make sure it works on try (and also to make sure nothing else is broken). https://treeherder.mozilla.org/#/jobs?repo=try&revision=344c1090f7de
Attachment #8666232 - Flags: feedback?(bugmail.mozilla) → feedback+
I refreshed my local build today to play around with some spell checking issues.

On right-click in textarea (to inspect the dictionary setting) I get a crash in
mozilla-central/dom/base/nsContentUtils.cpp line 2713:
MOZ_CRASH("Accessing the Subject Principal without an AutoJSAPI on the stack is forbidden");

That's on Windows with no e10s.

HTML used:
<body>
<textarea lang="en-US">this is English text</textarea>
</body>

I applied the patch in attachment 8666232 [details] [diff] [review] and that doesn't help. Please try it yourself, test case is dead simple: Open a document with the content as indicated. Right-click, crash.

I will spare you the stack trace, only the first few lines:
xul.dll!nsContentUtils::SubjectPrincipal() Line 2713	C++
xul.dll!nsContentUtils::IsCallerChrome() Line 2062	C++
xul.dll!nsContentUtils::CanAccessNativeAnon() Line 6101	C++
xul.dll!mozilla::dom::UIEvent::GetRangeParent() Line 234	C++
xul.dll!mozilla::dom::UIEvent::GetRangeParent(nsIDOMNode * * aRangeParent) Line 249	C++
xul.dll!mozilla::dom::MouseEvent::GetRangeParent(nsIDOMNode * * aRangeParent) Line 32	C++
xul.dll!nsXULPopupManager::InitTriggerEvent(nsIDOMEvent * aEvent, ...) Line 579	C++
xul.dll!nsXULPopupManager::ShowPopupAtScreen(nsIContent * aPopup, ...) Line 747	C++
xul.dll!nsXULPopupListener::LaunchPopup(nsIDOMEvent * aEvent, nsIContent * aTargetContent) Line 435	C++
xul.dll!nsXULPopupListener::HandleEvent(nsIDOMEvent * aEvent) Line 216	C++

How did this go broken?
> On right-click in textarea

That's bug 1208680.

> How did this go broken?

Lack of tests, presumably.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #4)
> I also did
> a try push with APZ enabled to make sure it works on try (and also to make
> sure nothing else is broken).
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=344c1090f7de

Well at least this problem is fixed but I might have to fix some other things before I can reland my patch.
(In reply to Boris Zbarsky [:bz] from comment #6)
> That's bug 1208680.
I got close ;-) That one doesn't mention the crash details.

> Lack of tests, presumably.
extensions/spellcheck/tests/mochitest/test_bug1170484.html exercises the right-click menu, but only on a contenteditable, not a text area.
If you have time to add test coverage to trigger this crash, that would be awesome.
Duplicate of this bug: 1208740
Comment on attachment 8666232 [details] [diff] [review]
Separate API entry points. v1

Review of attachment 8666232 [details] [diff] [review]:
-----------------------------------------------------------------

smaug, if you get to this review first (and want to push it), feel free.
Attachment #8666232 - Flags: review?(bugs)
GetContent(nsIDOMWindow** aContent) is [noscript], so 
MOZ_RELEASE_ASSERT(nsContentUtils::IsCallerChrome(), "This should only be called by script over XPIDL"); there looks odd. The .idl indicates that calling from any C++ should be ok.
Use LegacyIsCallerChromeOrNativeCode() there?

LegacyIsCallerNativeCode() might be even better, but I don't know at which point some binary component might call the method (whether there is JS on stack).
(I think binary addons might be gone, but other forms of external binary components are still there.)
Comment on attachment 8666232 [details] [diff] [review]
Separate API entry points. v1

>+already_AddRefed<nsIDOMWindow>
>+nsGlobalWindow::GetContent()
>+{
>+  MOZ_ASSERT(IsOuterWindow());
>+  ErrorResult ignored;
>+  return GetContentInternal(ignored, /* aUnprivilegedCaller = */ false);
>+}
So I wouldn't add this method, since .content is rather odd, rarely used in C++, beast anyway


> nsGlobalWindow::GetContent(nsIDOMWindow** aContent)
> {
>   FORWARD_TO_OUTER(GetContent, (aContent), NS_ERROR_UNEXPECTED);
> 
>+  MOZ_RELEASE_ASSERT(nsContentUtils::IsCallerChrome(), "This should only be called by script over XPIDL");
But just assert LegacyIsCallerChromeOrNativeCode()?... I don't actually know.


Just make GetContentInternal to use LegacyIsCallerChromeOrNativeCode?



(I would perhaps rename LegacyIsCallerChromeOrNativeCode() to IsCallerChromeOrNativeCode(), and then replace most of the current IsCallerChrome with that.
 But I can understand why getting rid of IsCaller* would be even better.)
Attachment #8666232 - Flags: review?(bugs)
Comment on attachment 8666232 [details] [diff] [review]
Separate API entry points. v1

>+  nsCOMPtr<nsPIDOMWindow> rootWindow = do_QueryInterface(mDocumentNode->GetWindow());

nsIDocument::GetWindow returns nsPIDOMWindow*.  Why do you need the QI and strong ref?

>+  MOZ_RELEASE_ASSERT(nsContentUtils::IsCallerChrome(), "This should only be called by script over XPIDL");

It can't be called by script over XPIDL.  Script can only call the webidl methods on window.

The only possible caller of this method is some sort of binary addon calling the XPCOM method, no?  So I don't think we can say anything useful about whether script is on the stack here.  :(

>+nsGlobalWindow::GetContent()

Why can't this live on nsPIDOMWindow and avoid the Cast bit?
Attachment #8666232 - Flags: review?(bzbarsky) → review-
(In reply to Boris Zbarsky [:bz] from comment #14)
> nsIDocument::GetWindow returns nsPIDOMWindow*.  Why do you need the QI and
> strong ref?

Good point.

> >+  MOZ_RELEASE_ASSERT(nsContentUtils::IsCallerChrome(), "This should only be called by script over XPIDL");
> 
> It can't be called by script over XPIDL.  Script can only call the webidl
> methods on window.
> 
> The only possible caller of this method is some sort of binary addon calling
> the XPCOM method, no?  So I don't think we can say anything useful about
> whether script is on the stack here.  :(

Great! Let's just get rid of it then.

> >+nsGlobalWindow::GetContent()
> 
> Why can't this live on nsPIDOMWindow and avoid the Cast bit?

Because that would make this method unnecessary virtual, right? It seems like we're generally moving away from that and to nsPIDOMWindow->nsGlobalWindow casts, so I figured adding a cast method would clarify that invariant.
Attachment #8666882 - Flags: review?(bzbarsky)
Attachment #8666232 - Attachment is obsolete: true
> Great! Let's just get rid of it then.

Worksforme.

> Because that would make this method unnecessary virtual, right?

That's fair...  We should really try to nix this nsGlobalWindow/nsPIDOMWindow split somehow.  :(
Comment on attachment 8666882 [details] [diff] [review]
Separate API entry points. v2

r=me, but watch out for binary extension breakage due to the vtable change....
Attachment #8666882 - Flags: review?(bzbarsky) → review+
(In reply to Boris Zbarsky [:bz] from comment #19)
> Comment on attachment 8666882 [details] [diff] [review]
> Separate API entry points. v2
> 
> r=me, but watch out for binary extension breakage due to the vtable
> change....

The IID change should catch us there, right?
> The IID change should catch us there, right?

Only for people who QI to the interface.  For people who just get the interface from some getter... well, see http://hg.mozilla.org/mozilla-central/rev/8c96466a1077 for the sort of games we may end up having to play.  :(
Huh. But don't we force a recompile by bumping kVersion with each release?
You have bustage: error C2039: 'GetContent' : is not a member of 'nsIDOMWindow'

Going to back it out unless you can land a fix in the next 5 to 10 mins.
> Huh. But don't we force a recompile by bumping kVersion with each release?

I don't think that helps with things that dll-inject, right?  Anyway, just something to watch out for. :(
The command from comment 1 doesn't seem to reproduce a crash anymore, maybe because bug 1072150 got backed out? I don't have access to that bug but it appears to be in the REOPENED state so I'm assuming that's what happened.
No longer blocks: apz-desktop, apz-linux
Yes it was backed out. There were a bunch of similar crashes.
Thanks. This should block 1072150 then.
Blocks: 1072150
(In reply to Bobby Holley (:bholley) from comment #9)
> If you have time to add test coverage to trigger this crash, that would be
> awesome.
I'm adding another test in bug 1209414 that would have triggered the crash in bug 1208680.
That was on my to-do list anyway. Sorry I can't help with this crash here.
https://hg.mozilla.org/mozilla-central/rev/d8494bbabea7
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla44
You need to log in before you can comment on or make changes to this bug.