Assert that labeled runnables don't touch anything outside their TabGroup

RESOLVED FIXED in Firefox 54

Status

()

Core
DOM
RESOLVED FIXED
10 months ago
2 months ago

People

(Reporter: billm, Assigned: billm)

Tracking

(Depends on: 1 bug, Blocks: 2 bugs)

unspecified
mozilla54
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox54 fixed)

Details

Attachments

(8 attachments, 3 obsolete attachments)

(Assignee)

Description

10 months ago
This bug will track the work to assert if anyone ever labels a runnable with the wrong TabGroup.
(Assignee)

Updated

10 months ago
Depends on: 1334572
(Assignee)

Updated

10 months ago
Depends on: 1334346
(Assignee)

Comment 1

10 months ago
Created attachment 8834667 [details] [diff] [review]
move Dispatcher.{cpp,h} to xpcom/threads

The goal of the first couple patches here is to create a base class, ValidatingDispatcher, that asserts that any runnable dispatch within it does not touch stuff that it shouldn't. Initially, TabGroup and SystemGroup will inherit from ValidatingDispatcher. When we make more progress separating runnables, DocGroup will inherit from ValidatingDispatcher and TabGroup no longer will.

The way I propose to do this is to put Dispatcher and ValidatingDispatcher inside xpcom/threads. I think it makes the most sense here since it needs to know about things like whether we're in a nested event loop. Eventually, the scheduler code to schedule tasks on "fibers" will be part of this as well, and I think it makes more sense to put it in XPCOM than in DOM.

Nathan, this patch will give you some deja vu. I hope the explanation above seems sensible.
Attachment #8834667 - Flags: review?(nfroyd)
(Assignee)

Comment 2

10 months ago
Created attachment 8834671 [details] [diff] [review]
move TabGroup code to ValidatingDispatcher

This patch creates the ValidatingDispatcher class and moves some of the TabGroup code into it. See the previous comment for more explanation.
Attachment #8834671 - Flags: review?(ehsan)
(Assignee)

Comment 3

10 months ago
Created attachment 8834672 [details] [diff] [review]
make a SystemGroup singleton

This patch makes a SystemGroup singleton instance that inherits from ValidatingDispatcher. This way SystemGroup now gets all the functionality of ValidatingDispatcher for free.
Attachment #8834672 - Flags: review?(ehsan)
(Assignee)

Comment 4

10 months ago
Created attachment 8834673 [details] [diff] [review]
JS engine assertions

This patch allows us to ask the JS engine to assert that only certain compartments are used in certain conditions. This will be used to ensure that, if a runnable is labeled with TabGroup G, then only compartments associated with G can run JS code while that runnable is running.
Attachment #8834673 - Flags: review?(sphink)
(Assignee)

Comment 5

10 months ago
Created attachment 8834677 [details] [diff] [review]
gecko assertions

This patch does most of the work to add the assertions. The assertions are in the JS engine as well as in nsDocument::BeginUpdate and event dispatch. These seem like pretty good choke points to me.

There are a couple issues here:

1. The need to wrap every labeled runnable is kind of unfortunate. However, since Andreas is already doing that for telemetry, I figure we can fold these two together. I'll rebase on top of his changes assuming he lands first.

2. The lifetime of the mValidAccess pointer is a little worrisome. Basically, the TabGroup needs to live at least as long as any JS compartment associated with it. I think that should be true because:
  i) the global object for the compartment is the inner window,
  ii) inner windows keep their outer windows alive, and
  iii) outer windows keep their TabGroup alive.
Attachment #8834677 - Flags: review?(ehsan)
(Assignee)

Updated

10 months ago
Depends on: 1337577
(Assignee)

Comment 6

10 months ago
Created attachment 8834707 [details] [diff] [review]
JS engine assertions, v2

I just rebased over Brian's context/runtime changes, so I might as well post an updated patch. It's pretty different.
Attachment #8834673 - Attachment is obsolete: true
Attachment #8834673 - Flags: review?(sphink)
Attachment #8834707 - Flags: review?(sphink)
Comment on attachment 8834672 [details] [diff] [review]
make a SystemGroup singleton

Review of attachment 8834672 [details] [diff] [review]:
-----------------------------------------------------------------

::: layout/build/nsLayoutStatics.cpp
@@ +152,5 @@
>                  "nsLayoutStatics", 1);
>  
>    nsresult rv;
>  
> +  SystemGroup::InitStatic();

Can we please try to put initialization of XPCOM things inside XPCOM itself?
Attachment #8834672 - Flags: review?(ehsan) → review+
Comment on attachment 8834672 [details] [diff] [review]
make a SystemGroup singleton

Review of attachment 8834672 [details] [diff] [review]:
-----------------------------------------------------------------

Sigh, didn't mean to r+ that; habit.
Attachment #8834672 - Flags: review+ → review?(ehsan)
Comment on attachment 8834667 [details] [diff] [review]
move Dispatcher.{cpp,h} to xpcom/threads

Review of attachment 8834667 [details] [diff] [review]:
-----------------------------------------------------------------

The deja vu is strong with this patch. :)

I'm still not totally clear why this needs to go in XPCOM.  But assuming it does eventually go in XPCOM, I think the comments need to be made more XPCOM-y and a little less DOM-y.  I see that TaskCategory.h talks about TabGroup and DocGroup already, but maybe Dispatcher.h could explain what TabGroup and DocGroup are more general instances of, rather than assuming they're concepts the reader would already understand?

Canceling review just because I think there are things we should discuss first.

::: xpcom/threads/Dispatcher.h
@@ +32,5 @@
> +// This trait should be attached to classes like nsIGlobalObject and nsIDocument
> +// that have a DocGroup or TabGroup attached to them. The methods here should
> +// delegate to the DocGroup or TabGroup. We can't use the Dispatcher class
> +// directly because it inherits from nsISupports.
> +class DispatcherTrait {

"Trait" seems like unusual terminology to use here; I tend to think of "trait" as being used for compile-time metaprogramming things.

It also seems weird to have DispatcherTrait that is meant to be inherited from and also Dispatcher as a base class with identical (or nearly so) virtual methods, but one is implemented here and the other is not.  I feel like there's some common functionality there that hasn't been pulled out.  Or the functionality of DispatcherTrait vs. Dispatcher hasn't been done correctly.

@@ +35,5 @@
> +// directly because it inherits from nsISupports.
> +class DispatcherTrait {
> +public:
> +  // This method may or may not be safe off of the main thread. For nsIDocument
> +  // it is safe. For nsIGlobalWindow it is not safe.

This sort of comment doesn't belong at the XPCOM level.  The comment is also peculiar because it's talking about a virtual method with a default implementation, but not actually describing the default implementation.  Instead, it's describing the (presumable) overrides of the virtual method someplace that doesn't even correspond to where the overrides would be implemented.

Also, the implementation on nsIDocument *is* thread-safe, while the implementation on nsIGlobalWindow is *not*?  That's pretty weird...

@@ +42,5 @@
> +                            already_AddRefed<nsIRunnable>&& aRunnable);
> +
> +  // This method may or may not be safe off of the main thread. For nsIDocument
> +  // it is safe. For nsIGlobalWindow it is not safe. The nsIEventTarget can
> +  // always be used off the main thread.

Likewise for this method.

@@ +51,5 @@
> +  virtual AbstractThread* AbstractMainThreadFor(TaskCategory aCategory);
> +};
> +
> +// Base class for DocGroup and TabGroup.
> +class Dispatcher : public nsISupports {

Reading this comment makes on wonder why this class is here in XPCOM rather than someplace else like DOM.  If we want this to be a generic sort of thing that can have implementations inside XPCOM (i.e. ValidatingDispatcher), I think we need to describe it in more generic terms.

Also, why are we inheriting from nsISupports here?

@@ +53,5 @@
> +
> +// Base class for DocGroup and TabGroup.
> +class Dispatcher : public nsISupports {
> +public:
> +  // This method is always safe to call off the main thread.

Perhaps:

// Implementations of this method must be safe to call off the main thread.

@@ +59,5 @@
> +                            TaskCategory aCategory,
> +                            already_AddRefed<nsIRunnable>&& aRunnable) = 0;
> +
> +  // This method is always safe to call off the main thread. The nsIEventTarget
> +  // can always be used off the main thread.

Likewise here.

@@ +63,5 @@
> +  // can always be used off the main thread.
> +  virtual nsIEventTarget* EventTargetFor(TaskCategory aCategory) const = 0;
> +
> +  // Must be called on the main thread. The AbstractThread can always be used
> +  // off the main thread.

Similarly:

// Implementations of this method must be called on the main thread.
// The returned AbstractThread can always be used off the main thread.

Though, thinking about those invariants, maybe it's not best to trust the implementors to DTRT, and we should instead:

class Dispatcher {
public:
  AbstractThread* AbstractMainThreadFor(...);

protected:
  virtual AbstractThread* AbstractMainThreadForImpl(...) = 0;
};

AbstractThread*
Dispatcher::AbstractMainThreadFor(...)
{
  MOZ_RELEASE_ASSERT(NS_IsMainThread());
  return AbstractMainThreadForImpl(...);
}

and similarly for the DispatcherTrait class above?  Then everybody gets the main thread enforcement for free--at least assuming they don't call the *Impl method directly, which would be a little peculiar.

@@ +68,5 @@
> +  virtual AbstractThread* AbstractMainThreadFor(TaskCategory aCategory) = 0;
> +
> +  // This method performs a safe cast. It returns null if |this| is not of the
> +  // requested type.
> +  virtual dom::TabGroup* AsTabGroup() { return nullptr; }

I'm not really excited about this being here.
Attachment #8834667 - Flags: review?(nfroyd)
(Assignee)

Comment 10

10 months ago
Created attachment 8835279 [details] [diff] [review]
move Dispatcher.{cpp,h} to xpcom/threads, v2

I made some changes in this version:

- I split DispatcherTrait into a separate file that will remain in DOM code.
- I rewrote the comments to make them a little less DOM-specific.
- Made the AbstractMainThreadForImpl change.

I'm contemplating changing the name of Dispatcher to SchedulerGroup or something like that.

As far as the AsTabGroup() method goes, here are the options:
- I could keep it, which would probably allow me to make Dispatcher stop inheriting from nsISupports. It could just have pure virtual AddRef/Release methods.
- Or I could implement the AsTabGroup() thing using a QI to TabGroup. That's a little gross since TabGroup is a concrete class, but whatever. In that case Dispatcher would need to inherit from nsISupports.

Let me know which one you prefer.
Attachment #8834667 - Attachment is obsolete: true
Attachment #8835279 - Flags: review?(nfroyd)
(In reply to Bill McCloskey (:billm) from comment #10)
> Created attachment 8835279 [details] [diff] [review]
> move Dispatcher.{cpp,h} to xpcom/threads, v2
> 
> I made some changes in this version:
> 
> - I split DispatcherTrait into a separate file that will remain in DOM code.

Thank you.  DispatcherTrait still seems a little weird to me (isn't the default implementation just SystemGroup:: methods by another name?), but whatever.

> I'm contemplating changing the name of Dispatcher to SchedulerGroup or
> something like that.

I like this name better.

> As far as the AsTabGroup() method goes, here are the options:
> - I could keep it, which would probably allow me to make Dispatcher stop
> inheriting from nsISupports. It could just have pure virtual AddRef/Release
> methods.
> - Or I could implement the AsTabGroup() thing using a QI to TabGroup. That's
> a little gross since TabGroup is a concrete class, but whatever. In that
> case Dispatcher would need to inherit from nsISupports.

I think we want to avoid QI to concrete classes, so let's go with the first option.  But then I wonder if Dispatcher really need to be refcounted...
Comment on attachment 8835279 [details] [diff] [review]
move Dispatcher.{cpp,h} to xpcom/threads, v2

Review of attachment 8835279 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with changes below; we can hash out any naming changes as follow-up patches in this bug or a separate bug.

::: xpcom/threads/Dispatcher.cpp
@@ +37,5 @@
> +
> +  Dispatcher* Dispatcher() const { return mDispatcher; }
> +
> +private:
> +  virtual ~DispatcherEventTarget() {}

I know there's inheritance going on, but this doesn't really need to be virtual, given that we always delete instances of this class from Release() and therefore are deleting the concrete type...right?

::: xpcom/threads/Dispatcher.h
@@ +28,5 @@
> +//
> +// A Dispatcher is an abstract class to represent a "group". Essentially the
> +// only functionality offered by a Dispatcher is the ability to dispatch
> +// runnables to the group. TabGroup, DocGroup, and SystemGroup are the concrete
> +// implementations of Dispatcher.

This is a much more informative comment, thank you.

@@ +46,5 @@
> +  AbstractThread* AbstractMainThreadFor(TaskCategory aCategory);
> +
> +  // This method performs a safe cast. It returns null if |this| is not of the
> +  // requested type.
> +  virtual dom::TabGroup* AsTabGroup() { return nullptr; }

Still not wild about this, but if we're going to have it, can we at least give this class pure virtual AddRef/Release methods, as you suggest, rather than inheriting from nsISupports?
Attachment #8835279 - Flags: review?(nfroyd) → review+

Comment 13

10 months ago
Comment on attachment 8834671 [details] [diff] [review]
move TabGroup code to ValidatingDispatcher

Review of attachment 8834671 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/base/TabGroup.cpp
@@ +169,5 @@
>  
>      // There is a RefPtr cycle TabGroup -> DispatcherEventTarget -> TabGroup. To
>      // avoid leaks, we need to break the chain somewhere. We shouldn't be using
>      // the ThrottledEventQueue for this TabGroup when no windows belong to it,
>      // so it's safe to null out the queue here.

I think you can remove the comment from here.

::: xpcom/threads/Dispatcher.cpp
@@ +36,5 @@
>    // Return non DocGroup version by default.
>    return AbstractThread::MainThread();
>  }
>  
> +/* DispatcherEventTarget */

Did you want to say something more here? :-)

@@ +162,5 @@
> +{
> +  for (size_t i = 0; i < size_t(TaskCategory::Count); i++) {
> +    TaskCategory category = static_cast<TaskCategory>(i);
> +    if (!aNeedValidation) {
> +      // The chrome TabGroup dispatches directly to the main thread. This means

Nit: system tabgroup?
Attachment #8834671 - Flags: review?(ehsan) → review+

Comment 14

10 months ago
Comment on attachment 8834672 [details] [diff] [review]
make a SystemGroup singleton

Review of attachment 8834672 [details] [diff] [review]:
-----------------------------------------------------------------

::: xpcom/threads/SystemGroup.cpp
@@ +38,5 @@
> +SystemGroupImpl::InitStatic()
> +{
> +  MOZ_ASSERT(!sSingleton);
> +  MOZ_ASSERT(NS_IsMainThread());
> +  sSingleton = new SystemGroupImpl();

Can you use ClearOnShutdown() here to void the shutdown code path?

@@ +52,5 @@
> +/* static */ SystemGroupImpl*
> +SystemGroupImpl::Get()
> +{
> +  MOZ_ASSERT(sSingleton);
> +  return sSingleton;

Is there any particular reason why you aren't creating this lazily?  This way you can avoid the init code path as well, and address Nathan's comment indirectly!

@@ +55,5 @@
> +  MOZ_ASSERT(sSingleton);
> +  return sSingleton;
> +}
> +
> +NS_IMPL_ISUPPORTS(SystemGroupImpl, nsISupports)

Typically for singleton classes that are refcounted, we add a dummy AddRef() method that returns 2 and a dummy Release() method that returns 1 since the lifetime is managed explicitly.  I suggest you do the same here.
Attachment #8834672 - Flags: review?(ehsan) → review+

Comment 15

10 months ago
(In reply to Bill McCloskey (:billm) from comment #5)
> 1. The need to wrap every labeled runnable is kind of unfortunate. However,
> since Andreas is already doing that for telemetry, I figure we can fold
> these two together. I'll rebase on top of his changes assuming he lands
> first.

Yeah, this is unfortunate, but it's OK I think.

> 2. The lifetime of the mValidAccess pointer is a little worrisome.
> Basically, the TabGroup needs to live at least as long as any JS compartment
> associated with it. I think that should be true because:
>   i) the global object for the compartment is the inner window,
>   ii) inner windows keep their outer windows alive, and
>   iii) outer windows keep their TabGroup alive.

The first one isn't really true.  The global object and the inner window are separate things, and actually the lifetime of the global is tied to the outer window not the inner window, although in many cases code running under the global keeps the inner alive as well.

At any rate, your assertion that the lifetime is fine now seems correct, but I think this is the reverse way of doing things.  The real problem is that the JS engine cannot know where the boolean is if it's in Gecko, but if the boolean is in the JS engine then Gecko can look it up easily when it needs to read it or set it.

So I suggest making the boolean a member of the global (we can use a slot for efficient access) and getting rid of the valid access ptr APIs.  Instead, when you need to check this inside the JS engine, access the global object's valid access slot.  Then we can expose APIs such as js::LockGlobal() and js::UnlockGlobal() which would be what we would call inside ValidatingDispatcher::Runnable::Run() and similar places.  We can also have a getter: js::IsGlobalLocked() or some such for checking things on the Gecko side.  Then we never would have to worry about the lifetime of this pointer, instead we can rely on the existing setup that ensures the lifetime of the global object is correct.

What do you think?
Flags: needinfo?(wmccloskey)

Comment 16

10 months ago
Comment on attachment 8834677 [details] [diff] [review]
gecko assertions

Clearing this for now to make it not show up in my queue.
Attachment #8834677 - Flags: review?(ehsan)
(Assignee)

Comment 17

10 months ago
Created attachment 8835764 [details] [diff] [review]
stop passing TabGroup as aRequestor for FindItemWithName

In order to avoid making TabGroup inherit from nsISupports, I had to make this change to FindItemWithName. I finally decided that passing an extra bool argument here is the simplest thing to do.
Attachment #8835764 - Flags: review?(michael)
(Assignee)

Comment 18

10 months ago
(In reply to :Ehsan Akhgari from comment #14)
> Comment on attachment 8834672 [details] [diff] [review]
> make a SystemGroup singleton
> 
> Review of attachment 8834672 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: xpcom/threads/SystemGroup.cpp
> @@ +38,5 @@
> > +SystemGroupImpl::InitStatic()
> > +{
> > +  MOZ_ASSERT(!sSingleton);
> > +  MOZ_ASSERT(NS_IsMainThread());
> > +  sSingleton = new SystemGroupImpl();
> 
> Can you use ClearOnShutdown() here to void the shutdown code path?

No. Since SystemGroupImpl has a pointer to a DispatcherEventTarget, which points back to it, we need to break the cycle before shutting down. That's what the Shutdown method does.

> @@ +52,5 @@
> > +/* static */ SystemGroupImpl*
> > +SystemGroupImpl::Get()
> > +{
> > +  MOZ_ASSERT(sSingleton);
> > +  return sSingleton;
> 
> Is there any particular reason why you aren't creating this lazily?  This
> way you can avoid the init code path as well, and address Nathan's comment
> indirectly!

Since SystemGroup can be used on any thread, there would be a race for who would create it first. To avoid that, it seemed easiest to create it up front. I could also use a lock and create it lazily, but it seems better to avoid a lock if possible.

I moved the initialization/shutdown code to XPCOMInit.cpp.

> @@ +55,5 @@
> > +  MOZ_ASSERT(sSingleton);
> > +  return sSingleton;
> > +}
> > +
> > +NS_IMPL_ISUPPORTS(SystemGroupImpl, nsISupports)
> 
> Typically for singleton classes that are refcounted, we add a dummy AddRef()
> method that returns 2 and a dummy Release() method that returns 1 since the
> lifetime is managed explicitly.  I suggest you do the same here.

OK, did that.
(Assignee)

Comment 19

10 months ago
(In reply to :Ehsan Akhgari from comment #15)
> (In reply to Bill McCloskey (:billm) from comment #5)
> > 1. The need to wrap every labeled runnable is kind of unfortunate. However,
> > since Andreas is already doing that for telemetry, I figure we can fold
> > these two together. I'll rebase on top of his changes assuming he lands
> > first.
> 
> Yeah, this is unfortunate, but it's OK I think.
> 
> > 2. The lifetime of the mValidAccess pointer is a little worrisome.
> > Basically, the TabGroup needs to live at least as long as any JS compartment
> > associated with it. I think that should be true because:
> >   i) the global object for the compartment is the inner window,
> >   ii) inner windows keep their outer windows alive, and
> >   iii) outer windows keep their TabGroup alive.
> 
> The first one isn't really true.  The global object and the inner window are
> separate things, and actually the lifetime of the global is tied to the
> outer window not the inner window, although in many cases code running under
> the global keeps the inner alive as well.

I'm pretty sure that the JS engine's notion of a global is the inner window. You can see that here:
http://searchfox.org/mozilla-central/rev/afcf40f3eafd895611a839017730debb58a342a6/dom/base/nsGlobalWindow.cpp#2752
As long as any object in the compartment is alive, the JS engine should keep the global alive (and hence the inner window). At least that's how it worked a few years ago.

> At any rate, your assertion that the lifetime is fine now seems correct, but
> I think this is the reverse way of doing things.  The real problem is that
> the JS engine cannot know where the boolean is if it's in Gecko, but if the
> boolean is in the JS engine then Gecko can look it up easily when it needs
> to read it or set it.
> 
> So I suggest making the boolean a member of the global (we can use a slot
> for efficient access) and getting rid of the valid access ptr APIs. 
> Instead, when you need to check this inside the JS engine, access the global
> object's valid access slot.  Then we can expose APIs such as
> js::LockGlobal() and js::UnlockGlobal() which would be what we would call
> inside ValidatingDispatcher::Runnable::Run() and similar places.  We can
> also have a getter: js::IsGlobalLocked() or some such for checking things on
> the Gecko side.  Then we never would have to worry about the lifetime of
> this pointer, instead we can rely on the existing setup that ensures the
> lifetime of the global object is correct.
> 
> What do you think?

Originally I considered doing it this way. However, whenever we enter and exit a runnable, we would have to iterate over all the globals represented by a TabGroup to set their valid field to true or false (depending on whether we're entering or exiting). Some pages use a lot of iframes, and it seems worrisome to me that we would have to do that. The nice thing about the scheme in the patch is that we only need to set one field (on the TabGroup).

Another option for the future would be to move the field to the ZoneGroup. This is a new concept that bhackett introduced recently. Eventually, ZoneGroup will be the JS engine's internal notion of a TabGroup. However, I'd rather not do that now since it would force me to do all the work of setting up ZoneGroups so that they match our TabGroups.
Flags: needinfo?(wmccloskey)

Comment 20

10 months ago
Comment on attachment 8835764 [details] [diff] [review]
stop passing TabGroup as aRequestor for FindItemWithName

Review of attachment 8835764 [details] [diff] [review]:
-----------------------------------------------------------------

I'm definitely not the biggest fan of the name `aSkipTabGroup` but I can't think of a better one, so...
Attachment #8835764 - Flags: review?(michael) → review+

Comment 21

10 months ago
(In reply to Bill McCloskey (:billm) from comment #18)
> > ::: xpcom/threads/SystemGroup.cpp
> > @@ +38,5 @@
> > > +SystemGroupImpl::InitStatic()
> > > +{
> > > +  MOZ_ASSERT(!sSingleton);
> > > +  MOZ_ASSERT(NS_IsMainThread());
> > > +  sSingleton = new SystemGroupImpl();
> > 
> > Can you use ClearOnShutdown() here to void the shutdown code path?
> 
> No. Since SystemGroupImpl has a pointer to a DispatcherEventTarget, which
> points back to it, we need to break the cycle before shutting down. That's
> what the Shutdown method does.

Fair enough.

> > @@ +52,5 @@
> > > +/* static */ SystemGroupImpl*
> > > +SystemGroupImpl::Get()
> > > +{
> > > +  MOZ_ASSERT(sSingleton);
> > > +  return sSingleton;
> > 
> > Is there any particular reason why you aren't creating this lazily?  This
> > way you can avoid the init code path as well, and address Nathan's comment
> > indirectly!
> 
> Since SystemGroup can be used on any thread, there would be a race for who
> would create it first. To avoid that, it seemed easiest to create it up
> front. I could also use a lock and create it lazily, but it seems better to
> avoid a lock if possible.

Makes sense.  I think it's better to keep it eager then, but please add a comment explaining why the initialization can't be lazy.

> I moved the initialization/shutdown code to XPCOMInit.cpp.

Thanks.

Comment 22

10 months ago
(In reply to Bill McCloskey (:billm) from comment #19)
> > Yeah, this is unfortunate, but it's OK I think.
> > 
> > > 2. The lifetime of the mValidAccess pointer is a little worrisome.
> > > Basically, the TabGroup needs to live at least as long as any JS compartment
> > > associated with it. I think that should be true because:
> > >   i) the global object for the compartment is the inner window,
> > >   ii) inner windows keep their outer windows alive, and
> > >   iii) outer windows keep their TabGroup alive.
> > 
> > The first one isn't really true.  The global object and the inner window are
> > separate things, and actually the lifetime of the global is tied to the
> > outer window not the inner window, although in many cases code running under
> > the global keeps the inner alive as well.
> 
> I'm pretty sure that the JS engine's notion of a global is the inner window.
> You can see that here:
> http://searchfox.org/mozilla-central/rev/
> afcf40f3eafd895611a839017730debb58a342a6/dom/base/nsGlobalWindow.cpp#2752
> As long as any object in the compartment is alive, the JS engine should keep
> the global alive (and hence the inner window). At least that's how it worked
> a few years ago.

Now I'm beginning to doubt my own understanding...  :-)  Here is what I know.

On the Gecko side, the global object is held alive by nsJSContext.  That in itself is held alive by nsGlobalWindow::mContext on the outer window.  It's initialized here: <http://searchfox.org/mozilla-central/rev/d3307f19d5dac31d7d36fc206b00b686de82eee4/dom/base/nsGlobalWindow.cpp#2347>.  I think you're right in that the JS engine's global corresponds to the inner window and it keeps the inner alive, but the global itself I think is held on to by the outer window.

Boris, can you please look over this?  I may be missing something here.

> > At any rate, your assertion that the lifetime is fine now seems correct, but
> > I think this is the reverse way of doing things.  The real problem is that
> > the JS engine cannot know where the boolean is if it's in Gecko, but if the
> > boolean is in the JS engine then Gecko can look it up easily when it needs
> > to read it or set it.
> > 
> > So I suggest making the boolean a member of the global (we can use a slot
> > for efficient access) and getting rid of the valid access ptr APIs. 
> > Instead, when you need to check this inside the JS engine, access the global
> > object's valid access slot.  Then we can expose APIs such as
> > js::LockGlobal() and js::UnlockGlobal() which would be what we would call
> > inside ValidatingDispatcher::Runnable::Run() and similar places.  We can
> > also have a getter: js::IsGlobalLocked() or some such for checking things on
> > the Gecko side.  Then we never would have to worry about the lifetime of
> > this pointer, instead we can rely on the existing setup that ensures the
> > lifetime of the global object is correct.
> > 
> > What do you think?
> 
> Originally I considered doing it this way. However, whenever we enter and
> exit a runnable, we would have to iterate over all the globals represented
> by a TabGroup to set their valid field to true or false (depending on
> whether we're entering or exiting). Some pages use a lot of iframes, and it
> seems worrisome to me that we would have to do that. The nice thing about
> the scheme in the patch is that we only need to set one field (on the
> TabGroup).

Yeah that's a good point.

> Another option for the future would be to move the field to the ZoneGroup.
> This is a new concept that bhackett introduced recently. Eventually,
> ZoneGroup will be the JS engine's internal notion of a TabGroup. However,
> I'd rather not do that now since it would force me to do all the work of
> setting up ZoneGroups so that they match our TabGroups.

Yeah that's fair.

Maybe I'm just overthinking this.  AFAICT the only dereference of this variable happens under a MOZ_DIAGNOSTIC_ASSERT.  If that's correct, then the suspicious pointer may be fine if it passes try...
Flags: needinfo?(bzbarsky)
(Assignee)

Comment 23

10 months ago
(In reply to :Ehsan Akhgari from comment #22)
> On the Gecko side, the global object is held alive by nsJSContext.  That in
> itself is held alive by nsGlobalWindow::mContext on the outer window.  It's
> initialized here:
> <http://searchfox.org/mozilla-central/rev/
> d3307f19d5dac31d7d36fc206b00b686de82eee4/dom/base/nsGlobalWindow.cpp#2347>. 
> I think you're right in that the JS engine's global corresponds to the inner
> window and it keeps the inner alive, but the global itself I think is held
> on to by the outer window.

The important invariant we need is that the JS engine will never dereference the pointer after that TabGroup has died. To prove that, we can assume that the JS compartment is alive (otherwise we would never try to dereference the pointer). If the compartment is alive, then the JS engine ensures the JS engine global is alive. The global is a wrapper around the inner window, and it keeps a ref on the inner window. The inner window keeps the TabGroup alive.

I think you're right that the outer window is what keeps the compartment alive, and that happens through the nsJSContext. But I don't think we need to know that here.
Boy, is this complicated to describe given the ambiguity of terms like "global"....

> On the Gecko side, the global object is held alive by nsJSContext. 

The nsJSContext holds a strong ref to the outer nsGlobalWindow.  It also holds a ref to the JSObject* for the WindowProxy for the current Window.  Both of these links are cycle-collected.

The outer window holds a strong ref to mContext.  Again, cycle-collected.

The WindowProxy keeps its global (the Window JSObject*) alive, because every JSObject* keeps its global (in the JSObject* sense, not the nsIGlobalObject* sense) alive.

The WindowProxy holds a weak ref to the outer window, afaict.

The Window holds a _strong_ ref to the inner window, which itself holds a strong ref to the outer window.  I don't recall whether the inner window holds a strong ref to the Window or not.

Oh, and the docshell keeps the outer window alive.

So in general, the docshell keeps alive the outer window, which keeps alive the nsJSContext, which keeps alive the WindowProxy, which keeps alive the Window, which keeps alive the current inner window.

A given inner window and Window and whatnot can certainly die without the corresponding outer window dying; that's what needs to happen on navigation.

And after all that, I'm not sure whether I answered your question...
Flags: needinfo?(bzbarsky)

Comment 25

10 months ago
I think comment 23 addresses my concern completely.  Thanks Boris!

Comment 26

10 months ago
Comment on attachment 8834677 [details] [diff] [review]
gecko assertions

Review of attachment 8834677 [details] [diff] [review]:
-----------------------------------------------------------------

Should we also do this when running timeouts?

::: xpcom/threads/Dispatcher.cpp
@@ +247,5 @@
> +  sRunningDispatcher = aType == StartValidation ? this : nullptr;
> +  mAccessValid = aType == StartValidation;
> +
> +  dom::AutoJSAPI jsapi;
> +  jsapi.Init();

I think we need to check whether AutoJSAPI::Init() succeeds.
Attachment #8834677 - Flags: review+
Comment on attachment 8834707 [details] [diff] [review]
JS engine assertions, v2

Review of attachment 8834707 [details] [diff] [review]:
-----------------------------------------------------------------

This is only checking when running (well, interpreting) code in a compartment. Would it be possible to also check when entering a compartment, at least for DEBUG? (Heck, you could check when unwrapping too, but that might be a nuisance.)

::: js/src/jsfriendapi.h
@@ +2895,5 @@
>  
> +// Turn on assertions so that we check the compartment's validAccessPtr before
> +// running code for that compartment.
> +extern JS_FRIEND_API(void)
> +EnableAccessValidation(JSContext* cx, bool enabled);

I'm confused about what is associated with the compartment vs the context.

The context has a flag that says whether to check an access pointer for every compartment it touches. But the access pointer is stored with the compartment. So what happens if you turn access validation on for a compartment, but don't set the access pointer for all of the compartments you end up touching with that context? Similarly, what happens if your context enters a compartment that does not have a pointer set? Should it assert?

From the code, it appears that you just have to set the access pointer for compartments you actually want checked. Any compartment without an access pointer will allow the access, even if the context says to check. Are those the semantics you want? Either way, it needs to be documented. The current comment seems to imply that the compartment will have a validAccessPtr, and doesn't mention either what will happen if it doesn't, or how the user is expected to ensure that all such compartments will have them.

@@ +2898,5 @@
> +extern JS_FRIEND_API(void)
> +EnableAccessValidation(JSContext* cx, bool enabled);
> +
> +// Whenever we run code for |global|, check that *accessp is true (assuming
> +// access validation has been enabled using the call above).

When I read this, I wonder what 'check' means. DEBUG assert? Or does it raise some sort of runtime error or something? The answer is a diagnostic assert. Maybe just s/check/assert/?

@@ +2900,5 @@
> +
> +// Whenever we run code for |global|, check that *accessp is true (assuming
> +// access validation has been enabled using the call above).
> +extern JS_FRIEND_API(void)
> +SetCompartmentValidAccessPtr(JSContext* cx, JS::HandleObject global, bool* accessp);

Passing in a bool* to be accessed at various random times later makes me a little nervous -- can you comment what the conditions are for this pointer? (As in, how long it needs to live, and what threads are allowed to access it.)

::: js/src/vm/Interpreter.cpp
@@ +363,5 @@
>  
>      // Since any script can conceivably GC, make sure it's safe to do so.
>      cx->verifyIsSafeToGC();
>  
> +

extra blank line
(Assignee)

Comment 28

10 months ago
(In reply to Steve Fink [:sfink] [:s:] from comment #27)
> Comment on attachment 8834707 [details] [diff] [review]
> JS engine assertions, v2
> 
> Review of attachment 8834707 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> This is only checking when running (well, interpreting) code in a
> compartment. 

I think RunScript is called even if we end up using a JIT.

> Would it be possible to also check when entering a compartment,
> at least for DEBUG? (Heck, you could check when unwrapping too, but that
> might be a nuisance.)

That's a much stricter assertion. Eventually we will need to assert that since Brian's patches require it. But I'd like to hold the line at RunScript first.

> ::: js/src/jsfriendapi.h
> @@ +2895,5 @@
> >  
> > +// Turn on assertions so that we check the compartment's validAccessPtr before
> > +// running code for that compartment.
> > +extern JS_FRIEND_API(void)
> > +EnableAccessValidation(JSContext* cx, bool enabled);
> 
> I'm confused about what is associated with the compartment vs the context.
> 
> The context has a flag that says whether to check an access pointer for
> every compartment it touches. But the access pointer is stored with the
> compartment. So what happens if you turn access validation on for a
> compartment, but don't set the access pointer for all of the compartments
> you end up touching with that context? Similarly, what happens if your
> context enters a compartment that does not have a pointer set? Should it
> assert?
> 
> From the code, it appears that you just have to set the access pointer for
> compartments you actually want checked. Any compartment without an access
> pointer will allow the access, even if the context says to check. Are those
> the semantics you want?

Yes.

I'll make changes to address the other comments.
(Assignee)

Comment 29

10 months ago
Created attachment 8836234 [details] [diff] [review]
JS engine assertions, v3
Attachment #8834707 - Attachment is obsolete: true
Attachment #8834707 - Flags: review?(sphink)
Attachment #8836234 - Flags: review?(sphink)
(Assignee)

Comment 30

10 months ago
Created attachment 8836239 [details] [diff] [review]
workarounds

Hi Olli, this patch has the two workarounds we've been discussing over IRC.

The NS_ERROR_DOCSHELL_DYING code handles the case where a docshell for an iframe loads after it has been removed from its parent by nsDocLoader::DestroyChildren. In this case, the docshell ends up with a fresh TabGroup and we get assertions when we try to tear it down.

There's a comment to explain the IgnoreDocGroupMismatches() thing.
Attachment #8836239 - Flags: review?(bugs)
Comment on attachment 8836234 [details] [diff] [review]
JS engine assertions, v3

Review of attachment 8836234 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good as a mechanism. I'm going to read the other patches to understand how you choose compartments to mark.
Attachment #8836234 - Flags: review?(sphink) → review+
(In reply to Bill McCloskey (:billm) from comment #28)
> (In reply to Steve Fink [:sfink] [:s:] from comment #27)
> > Comment on attachment 8834707 [details] [diff] [review]
> > JS engine assertions, v2
> > 
> > Review of attachment 8834707 [details] [diff] [review]:
> > -----------------------------------------------------------------
> > 
> > This is only checking when running (well, interpreting) code in a
> > compartment. 
> 
> I think RunScript is called even if we end up using a JIT.

Hm. Looks like that's true. I thought there was a mechanism for cross-compartment calls from JIT code to JIT code, but it looks like not. I guess we currently depend on that to hold for other things right now (eg the profiler uses it as a marker that we're invoking JS code), so the patch here is fine, but I do wonder if we need to separate out these sorts of invariant checks and markers so that we *can* optimize this sort of thing away. Some things would need to be checked or done for such optimizations, and some wouldn't.
(Assignee)

Comment 33

10 months ago
Created attachment 8836252 [details] [diff] [review]
fix webaudio test

This just showed up in my latest try push. During a cycle collection, we call AudioNode::DestroyMediaStream. That notifies an observer ("webaudio-node-demise"), which invokes web content (the test). Since the cycle collector runs with the system group, we're not allowed to touch web content from it.

I fixed the problem by dispatching the observer notification off its own runnable. I expect we may see a few more of these as we get more testing.
Attachment #8836252 - Flags: review?(ehsan)
Hmm, yeah you may have a bit of a bad time here, as pretty much any cycle collected class can have its destructor called from within cycle collection, which can do who knows what.
(Assignee)

Comment 35

10 months ago
(In reply to Andrew McCreight [:mccr8] from comment #34)
> Hmm, yeah you may have a bit of a bad time here, as pretty much any cycle
> collected class can have its destructor called from within cycle collection,
> which can do who knows what.

I'm cautiously optimistic. So far I've only seen a few issues on try, and this one is only in a test.

Comment 36

10 months ago
Comment on attachment 8836252 [details] [diff] [review]
fix webaudio test

Review of attachment 8836252 [details] [diff] [review]:
-----------------------------------------------------------------

Yeah this notification is only used for devtools, so this seems like the right solution.

::: testing/specialpowers/content/specialpowersAPI.js
@@ +1225,5 @@
> +  _asyncObservers: new WeakMap(),
> +  addAsyncObserver: function(obs, notification, weak) {
> +    obs = Cu.waiveXrays(obs);
> +    if (typeof obs == 'object' && obs.observe.name != 'SpecialPowersCallbackWrapper')
> +      obs.observe = wrapCallback(obs.observe);

Nit: braces around if please.
Attachment #8836252 - Flags: review?(ehsan) → review+
Comment on attachment 8834671 [details] [diff] [review]
move TabGroup code to ValidatingDispatcher

>+class ValidatingDispatcher : public Dispatcher {
Nit, { goes to its own line with class declarations.
Comment on attachment 8836239 [details] [diff] [review]
workarounds

>@@ -7983,16 +7983,25 @@ nsDocShell::EnsureContentViewer()
> 
>   if (NS_SUCCEEDED(rv)) {
>     nsCOMPtr<nsIDocument> doc(GetDocument());
>     NS_ASSERTION(doc,
>                  "Should have doc if CreateAboutBlankContentViewer "
>                  "succeeded!");
> 
>     doc->SetIsInitialDocument(true);
>+
>+    // Documents created using EnsureContentViewer are typically transient
>+    // placeholders created by framescripts before content has a chance to
>+    // load.
I have no idea whether "typically" is valid here. EnsureContentViewer can be called by many things.
Perhaps say "are for example transient ..." or something

> In some cases, window.open(..., "noopener") will create such a
>+    // document (in a new TabGroup) and then synchronously tear it down, firing
>+    // a "pagehide" event. Doing so violates our assertions about
>+    // DocGroups. It's easier to silence the assertion here than to avoid
>+    // creating the extra document.
>+    doc->IgnoreDocGroupMismatches();
Please file a followup to fix this properly. 


> nsDocument::BeginUpdate(nsUpdateType aUpdateType)
> {
>   // If the document is going away, then it's probably okay to do things to it
>   // in the wrong DocGroup. We're unlikely to run JS or do anything else
>   // observable at this point. We reach this point when cycle collecting a
>   // <link> element and the unlink code removes a style sheet.
>-  if (mDocGroup && !mIsGoingAway) {
>+  if (mDocGroup && !mIsGoingAway && !mIgnoreDocGroupMismatches) {
>     mDocGroup->ValidateAccess();
>   }
I don't know what ValidateAccess does, but hopefully something very very fast, since this is super hot code.


> nsDocument::GetEventTargetParent(EventChainPreVisitor& aVisitor)
> {
>-  if (mDocGroup && aVisitor.mEvent->mMessage != eVoidEvent) {
>+  if (mDocGroup && aVisitor.mEvent->mMessage != eVoidEvent &&
>+      !mIgnoreDocGroupMismatches) {
>     mDocGroup->ValidateAccess();
>   }
ditto.

I don't pretend to understand what ValidateAccess does.
Attachment #8836239 - Flags: review?(bugs) → review+
(Assignee)

Updated

10 months ago
Blocks: 1339222
(Assignee)

Comment 39

10 months ago
Filed bug 1339222 for the follow-up here.

ValidateAccess just asserts that a field is true. I moved it to an inline function to make sure it's as fast as it can be.

Comment 40

10 months ago
Pushed by wmccloskey@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/3092fa33385e
Remove the need for TabGroup to be an nsISupports for DocShell::FindItemWithName (r=mystor)
https://hg.mozilla.org/integration/mozilla-inbound/rev/1021af9fea00
Move Dispatcher.{cpp,h} to xpcom/threads (r=froydnj)
https://hg.mozilla.org/integration/mozilla-inbound/rev/7c141bd7f221
Move some TabGroup code to ValidatingDispatcher (r=ehsan)
https://hg.mozilla.org/integration/mozilla-inbound/rev/1ed020f53dc9
Make a SystemGroup singleton (r=ehsan)
https://hg.mozilla.org/integration/mozilla-inbound/rev/bdc62272ba78
JS engine changes for compartment validation (r=sfink)
https://hg.mozilla.org/integration/mozilla-inbound/rev/f5b50410e0d2
Assert that runnables labeled with a given TabGroup never touch other TabGroups (r=ehsan)
https://hg.mozilla.org/integration/mozilla-inbound/rev/c57837e52c3d
Workarounds to avoid TabGroup assertions in edge cases (r=smaug)
https://hg.mozilla.org/integration/mozilla-inbound/rev/6fe6f63a39f1
Avoid SystemGroup assertion during WebAudio test (r=ehsan)

Comment 41

10 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/3092fa33385e
https://hg.mozilla.org/mozilla-central/rev/1021af9fea00
https://hg.mozilla.org/mozilla-central/rev/7c141bd7f221
https://hg.mozilla.org/mozilla-central/rev/1ed020f53dc9
https://hg.mozilla.org/mozilla-central/rev/bdc62272ba78
https://hg.mozilla.org/mozilla-central/rev/f5b50410e0d2
https://hg.mozilla.org/mozilla-central/rev/c57837e52c3d
https://hg.mozilla.org/mozilla-central/rev/6fe6f63a39f1
Status: NEW → RESOLVED
Last Resolved: 10 months ago
status-firefox54: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla54

Updated

9 months ago
Depends on: 1347117

Updated

9 months ago
No longer depends on: 1347117
Depends on: 1410091
You need to log in before you can comment on or make changes to this bug.