Closed Bug 1072144 Opened 10 years ago Closed 8 years ago

Use AutoJSAPI/AutoEntryScript on workers

Categories

(Core :: DOM: Workers, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla47
Tracking Status
firefox47 --- fixed

People

(Reporter: bholley, Assigned: bzbarsky)

References

(Blocks 1 open bug)

Details

Attachments

(7 files, 3 obsolete files)

14.89 KB, text/plain
Details
1.26 KB, patch
khuey
: review+
Details | Diff | Splinter Review
1.11 KB, patch
khuey
: review+
Details | Diff | Splinter Review
5.99 KB, patch
khuey
: review+
Details | Diff | Splinter Review
11.64 KB, patch
khuey
: review+
Details | Diff | Splinter Review
6.02 KB, patch
khuey
: review+
Details | Diff | Splinter Review
4.35 KB, patch
khuey
: review+
Details | Diff | Splinter Review
We use these for the main-thread pieces, but we still need to more-fully-integrate workers into the setup here before we can hoist error reporting into AutoJSAPI.
Blocks: 1072150
I'm waiting on this try run https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=cd6108d8f2d5 to make sure no existing stuff broke.
I've tried it in a Fetch API test, but I'm not sure how to add an automated test specifically for AutoJSAPI::TakeOwnershipOfErrorReporting to test the case where we call TakeOwnership...() but don't actually steal the exception, leading to the dtor trying to do so (although I've tested it locally).
Attachment #8532321 - Flags: review?(bobbyholley)
Assignee: nobody → nsm.nikhil
Status: NEW → ASSIGNED
Shouldn't have mixed the promise try run with this, will push another try run tomorrow (hotel wifi doesn't allow pushes)
Attachment #8532321 - Attachment is obsolete: true
Attachment #8532321 - Flags: review?(bobbyholley)
Comment on attachment 8532321 [details] [diff] [review]
Support AutoJSAPI::TakeOwnershipOfErrorReporting on workers

Review of attachment 8532321 [details] [diff] [review]:
-----------------------------------------------------------------

Sounds like this patch is obsolete anyway per IRL discussion.

::: dom/base/ScriptSettings.cpp
@@ +320,4 @@
>        JS::Rooted<JS::Value> exn(cx());
>        js::ErrorReport jsReport(cx());
>        if (StealException(&exn) && jsReport.init(cx(), exn)) {
> +        if (NS_IsMainThread()) {

use mIsMainThread here.

::: js/xpconnect/src/nsXPConnect.cpp
@@ +220,5 @@
>  static PRLogModuleInfo* gJSDiagnostics;
>  #endif
>  
> +namespace {
> +} // anonymous namespace

Uh, I think you want to remove this.

@@ +236,5 @@
> +}
> +
> +void
> +xpc::ErrorReport::LogToConsoleInternal(bool aFromWorker)
> +{

This should assert main-threadedness, right?

@@ +257,5 @@
>          fprintf(stderr, "%s\n", error.get());
>          fflush(stderr);
>      }
>  
> +    // Not sure if this is thread safe.

Why does it matter?

@@ +282,5 @@
>  
> +    nsresult rv;
> +    if (aFromWorker) {
> +        rv = errorObject->Init(mErrorMsg, mFileName, mSourceLine, mLineNumber,
> +                               mColumn, mFlags, mCategory.get());

On workers right now, we properly invoke InitWithWindowID with aInnerWindowId, right? So it seems like we need to store that somehow.
Attachment #8532321 - Attachment is obsolete: false
https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=d4b36aabda3c

This is a simplified version. On workers, errors are always handed of to the worker reporter, so there are none of the XPConnect changes.
I can't use mIsMainThread, since that is only available on AutoEntryScript. I hope this is congruent with what was discussed with :bholley and :bent.
Attachment #8534373 - Flags: review?(bobbyholley)
Attachment #8534373 - Flags: feedback?(bent.mozilla)
Ah, there is a ServiceWorker test failure, but I'm just going to disable the test, since that is not how we do things on maple anyway.
Comment on attachment 8534373 [details] [diff] [review]
Support AutoJSAPI::TakeOwnershipOfErrorReporting on workers

Review of attachment 8534373 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for doing this.

::: dom/base/ScriptSettings.cpp
@@ +327,4 @@
>        JS::Rooted<JS::Value> exn(cx());
>        js::ErrorReport jsReport(cx());
>        if (StealException(&exn) && jsReport.init(cx(), exn)) {
> +        if (NS_IsMainThread()) {

Please use mIsMainThread here to avoid the TLS lookup.

@@ +509,5 @@
>    JSRuntime *rt = JS_GetRuntime(cx());
>    mOldDontReportUncaught = JS::ContextOptionsRef(cx()).dontReportUncaught();
>    JS::ContextOptionsRef(cx()).setDontReportUncaught(true);
> +  // Workers have their own error reporting handler which deals with warnings
> +  // too.

This isn't quite right. The point of this operation is that, with TakeOwnershipOfErrorReporting, the JSErrorReporter is never supposed to receive any non-warning reports.

It seems like, for this patch to really be correct, it needs to include the worker-side changes too, which get rid of the JS_ReportPendingException calls and invoke TakeOwnershipOfErrorReporting on the aes/jsapi.

@@ +510,5 @@
>    mOldDontReportUncaught = JS::ContextOptionsRef(cx()).dontReportUncaught();
>    JS::ContextOptionsRef(cx()).setDontReportUncaught(true);
> +  // Workers have their own error reporting handler which deals with warnings
> +  // too.
> +  if (NS_IsMainThread()) {

mIsMainThread.
Attachment #8534373 - Flags: review?(bobbyholley) → review-
> @@ +509,5 @@
> >    JSRuntime *rt = JS_GetRuntime(cx());
> >    mOldDontReportUncaught = JS::ContextOptionsRef(cx()).dontReportUncaught();
> >    JS::ContextOptionsRef(cx()).setDontReportUncaught(true);
> > +  // Workers have their own error reporting handler which deals with warnings
> > +  // too.
> 
> This isn't quite right. The point of this operation is that, with
> TakeOwnershipOfErrorReporting, the JSErrorReporter is never supposed to
> receive any non-warning reports.

The worker error reporting (WorkerPrivate::ReportError) handles warnings appropriately, so I think a compromise would be to use it, but assert that RuntimeService.cpp ErrorReporter() receives only warnings.

There is a possibility that the assertion cannot be made until Bug 1107777 is fixed (at the least), so that ReportError() does not immediately call the error reporter. Should I fold that bug into this bug in such a case? I'll know once I run tests.

> 
> It seems like, for this patch to really be correct, it needs to include the
> worker-side changes too, which get rid of the JS_ReportPendingException
> calls and invoke TakeOwnershipOfErrorReporting on the aes/jsapi.
> 

Do you mean that every caller of JS_ReportPendingException in dom/workers (there are several) should instead create an AutoJSAPI and call TakeOwnership...(), and where it would call ReportPendingException, it will just let the dtor take care of it?
Flags: needinfo?(bobbyholley)
(In reply to Nikhil Marathe [:nsm] (away Nov 26-Nov 30) from comment #8)
> > @@ +509,5 @@
> > >    JSRuntime *rt = JS_GetRuntime(cx());
> > >    mOldDontReportUncaught = JS::ContextOptionsRef(cx()).dontReportUncaught();
> > >    JS::ContextOptionsRef(cx()).setDontReportUncaught(true);
> > > +  // Workers have their own error reporting handler which deals with warnings
> > > +  // too.
> > 
> > This isn't quite right. The point of this operation is that, with
> > TakeOwnershipOfErrorReporting, the JSErrorReporter is never supposed to
> > receive any non-warning reports.
> 
> The worker error reporting (WorkerPrivate::ReportError) handles warnings
> appropriately, so I think a compromise would be to use it, but assert that
> RuntimeService.cpp ErrorReporter() receives only warnings.
> 
> There is a possibility that the assertion cannot be made until Bug 1107777
> is fixed (at the least), so that ReportError() does not immediately call the
> error reporter. Should I fold that bug into this bug in such a case? I'll
> know once I run tests.
> 
> > 
> > It seems like, for this patch to really be correct, it needs to include the
> > worker-side changes too, which get rid of the JS_ReportPendingException
> > calls and invoke TakeOwnershipOfErrorReporting on the aes/jsapi.
> > 
> 
> Do you mean that every caller of JS_ReportPendingException in dom/workers
> (there are several) should instead create an AutoJSAPI and call
> TakeOwnership...(), and where it would call ReportPendingException, it will
> just let the dtor take care of it?

workers uses JS_ReportPendingException() extensively, and often with the actual call happening in a separate function from the one that performs the JS stuff that could lead to an exception occuring (Dispatch/PostDispatch, WorkerRun/PostRun). In those cases, is it worth creating an AutoJSAPI in the Post* cases just to report the exception in a new way?
(In reply to Nikhil Marathe [:nsm] (away Nov 26-Nov 30) from comment #8)
> The worker error reporting (WorkerPrivate::ReportError) handles warnings
> appropriately, so I think a compromise would be to use it, but assert that
> RuntimeService.cpp ErrorReporter() receives only warnings.

Yeah that makes sense.

> There is a possibility that the assertion cannot be made until Bug 1107777
> is fixed (at the least), so that ReportError() does not immediately call the
> error reporter. Should I fold that bug into this bug in such a case? I'll
> know once I run tests.

Can't bug 1107777 land first? Or is there a circular dependency somehow?

> Do you mean that every caller of JS_ReportPendingException in dom/workers
> (there are several) should instead create an AutoJSAPI and call
> TakeOwnership...(), and where it would call ReportPendingException, it will
> just let the dtor take care of it?

Precisely.

(In reply to Nikhil Marathe [:nsm] (away Nov 26-Nov 30) from comment #9)
> workers uses JS_ReportPendingException() extensively

I have no idea why the calls in URL.cpp exist - looks like they were added in bug 887364. Probably worth asking Andrea why he did that.

> , and often with the
> actual call happening in a separate function from the one that performs the
> JS stuff that could lead to an exception occuring (Dispatch/PostDispatch,
> WorkerRun/PostRun). In those cases, is it worth creating an AutoJSAPI in the
> Post* cases just to report the exception in a new way?

In the Run/PostRun case, it seems perfect, because PostRun() is currently a pseudo-tail-call at the end of Run(), and the AutoJSAPI is stack-scoped to Run - so it seems like we just remove the call to JS_ReportPendingException in PostRun and then the destructor does the work at roughly the same time.

In the PostDispatch() case, it seems like we should just let the exception bubble up to whoever passed aCx to Dispatch().

The basic idea here is that whoever initial gets the JSContext* has to create an AutoJSAPI to get it, and that's a perfect place to do the error reporting.
Flags: needinfo?(bobbyholley)
(In reply to Bobby Holley (:bholley) from comment #10)

> 
> > There is a possibility that the assertion cannot be made until Bug 1107777
> > is fixed (at the least), so that ReportError() does not immediately call the
> > error reporter. Should I fold that bug into this bug in such a case? I'll
> > know once I run tests.
> 
> Can't bug 1107777 land first? Or is there a circular dependency somehow?

Yes I can land it first as long as try is green.


> 
> > , and often with the
> > actual call happening in a separate function from the one that performs the
> > JS stuff that could lead to an exception occuring (Dispatch/PostDispatch,
> > WorkerRun/PostRun). In those cases, is it worth creating an AutoJSAPI in the
> > Post* cases just to report the exception in a new way?
> 
> In the Run/PostRun case, it seems perfect, because PostRun() is currently a
> pseudo-tail-call at the end of Run(), and the AutoJSAPI is stack-scoped to
> Run - so it seems like we just remove the call to JS_ReportPendingException
> in PostRun and then the destructor does the work at roughly the same time.
> 

PostRun() is useful precisely because WorkerRunnables can have custom behaviour without modifying WorkerRunnable::Run() which deals with setting up the AutoJSAPI. I don't think simply calling TakeOwnership...() is sufficient there. Workers can have nested event loops running where the outermost loop is blocked on the inner loops. Hoisting error handling into the WorkerRunnable::Run AutoJSAPI trips it up in things like importScripts(). Exceptions due to importScripts() are left on the cx by ScriptExecutorRunnable::PostRun() overriding the default behaviour of reporting the exception. This allows it to bubble up to the WebIDL binding, where it is dealt with. Making the change suggested above prevents this sort of overriding. We could keep PostRun() around and pass it the AutoJSAPI, and in the default case it would set the TakeOwnership...() flag, but this would let exceptions from WorkerRun() slip through to the error handler. I also doubt I have the bandwidth to track down all the breakage that could result from such a major change when WorkerRunnables are used in all the worker APIs, but we can deal with that once we have an approach to fix the above problem.
(In reply to Nikhil Marathe [:nsm] (away Nov 26-Nov 30) from comment #11)
> PostRun() is useful precisely because WorkerRunnables can have custom
> behaviour without modifying WorkerRunnable::Run() which deals with setting
> up the AutoJSAPI. I don't think simply calling TakeOwnership...() is
> sufficient there. Workers can have nested event loops running where the
> outermost loop is blocked on the inner loops. Hoisting error handling into
> the WorkerRunnable::Run AutoJSAPI trips it up in things like
> importScripts(). Exceptions due to importScripts() are left on the cx by
> ScriptExecutorRunnable::PostRun() overriding the default behaviour of
> reporting the exception. This allows it to bubble up to the WebIDL binding,
> where it is dealt with.

So in the sync loop case, there are two AutoJSAPIs on the stack, right? If that's the case, it seems like the correct solution is to have the inner one invoke StealException, and stick the value somewhere where the outer AutoJSAPI can find it and process it. Alternatively, ScriptExecutorRunnable could set a flag that would tell the runnable dispatcher to use AutoInheritJSAPI.

 Making the change suggested above prevents this sort
> of overriding. We could keep PostRun() around and pass it the AutoJSAPI, and
> in the default case it would set the TakeOwnership...() flag, but this would
> let exceptions from WorkerRun() slip through to the error handler. I also
> doubt I have the bandwidth to track down all the breakage that could result
> from such a major change when WorkerRunnables are used in all the worker
> APIs, but we can deal with that once we have an approach to fix the above
> problem.
Blocks: 1112920
It is possible to implement Body.json() correctly using just the autoJSAPIOwnsErrorReporting flag that landed in Bug 1107777.
No longer blocks: dom-fetch-api, 1121603
Comment on attachment 8534373 [details] [diff] [review]
Support AutoJSAPI::TakeOwnershipOfErrorReporting on workers

Canceling request until bholley's feedback is addressed
Attachment #8534373 - Flags: feedback?(bent.mozilla)
Notes from irc today:

<bent> well the only issue i remember is that we need to leave exceptions set while running a nested loop
<bent> the cx is where we do that today,
<bent> moving it off cx means we have to figure out somewhere else to stuff it
<bholley> bz, bent: that sounds right
<bholley> steal it and reset it
<bholley> like we do in DOM bindings
<bent> the issue is code that does something like |try { importScripts("scriptThatThrows.js"); } catch(e) { /* do something with e and keep going */ }|
<bent> importScripts runs a nested loop
<bent> so if script that it loads throws then we don't want to report an error immediately
> right
<bent> we need to leave the exception set so that the caller can catch it
> That makes sense
<bent> so putting AutoJSAPI on WorkerRunnable that auto-reports is not going to work
<bent> we have to somehow steal from them
Blocks: 1172246
Assignee: nsm.nikhil → nobody
Status: ASSIGNED → NEW
Flags: needinfo?(bzbarsky)
Blocks: 1207693
No longer blocks: 1072150
Depends on: 1224596
OK, so here is the current plan.

1)  Land bug 1224596.
2)  Land attachment 8534373 [details] [diff] [review] or equivalent to allow TakeOwnershipOfErrorReporting to work on
    workers at all.  I filed bug 1225717 on this.
3)  Convert remaining JS_ReportPendingException codepaths to use AutoJSAPI error reporting;
    probably loosely based on attachment 8617521 [details] [diff] [review].  It's possible we can do this somewhat
    incrementally.
4)  Make the default worker error reporter assert that it's only seeing warnings.
Depends on: 1225717
Depends on: 1248719
Depends on: 1248737
Depends on: 1249102
Depends on: 1250185
Depends on: 1250234
Depends on: 1250291
Depends on: 1250963
Depends on: 1250975
Depends on: 1251045
Depends on: 1251272
Depends on: 1251275
Depends on: 1251276
Depends on: 1251369
Depends on: 1251380
Depends on: 1251697
Current status update:

The only worker callers of JS_ReportPendingException that remain are in WorkerRunnable::WorkerRun/PostRun overrides.  I have fixed the obvious implementations of WorkerRun that can throw to not leave an exception on the JSContext unless they want it reported by PostRun.  I have a green try run with unconditional reporting in PostRun.

Now this doesn't mean much, since as I discovered while doing that our test coverage is crappy (e.g. we had no coverage at all for exceptions from XHR methods).  So the next step is to audit all WorkerRun implementations and see whether any of them might be in a situation where they throw an exception but don't report it in PostRun, either because they return true or because they override PostRun.  If any such implementations exist, they need to be fixed.  So that's the next step.
Flags: needinfo?(bzbarsky)
Assignee: nobody → bzbarsky
Flags: needinfo?(bzbarsky)
Depends on: 1252212
Depends on: 1252221
These are on top of revision 42f4ce691c60 if you want to follow along with the line numbers.  That revision includes fixes for some things the audit found, but none of them were really substantive.

There are four outstanding issues found by the audit that are not patched yet.  Bug 1252212 tracks one, and bug 1252221 another.  The other two I'll just fix here.

I have also done an audit of PostRun overrides.  That looks like this:

1)  CancelRunnable in WebSocket.cpp.  Doesn't report exceptions.  But its
WorkerRun can't throw an exception either.  So OK.

2)  WorkerRunnableDispatcher in WebSocket.  Doesn't report exceptions.  Looks like the corresponding WorkerRun can't throw.

3)  ScriptExecutorRunnable::PostRun.  See analysis for ScriptExecutorRunnable::WorkerRun

4)  ModifyBusyCountRunnable::PostRun.  Not an issue, since ModifyBusyCountRunnable::WorkerRun never throws.

5)  CloseEventRunnable.  Calls superclas, so is all good.

6)  ExtendableFunctionalEventWorkerRunnable.  Calls superclass.

7)  NotificationWorkerRunnable.  Doesn't call superclass.  WorkerRunInternal
impls do various stuff.  But all of that stuff seems like it can't throw.
Attachment #8534373 - Attachment is obsolete: true
Flags: needinfo?(bzbarsky)
Attachment #8617521 - Attachment is obsolete: true
Blocks: 1252565
Looks like test_recursion is failing on Android debug only.  Debug only because it's failing the MOZ_ASSERT_IF(result, !jsapi->HasException()); assertion I added in this bug.  Android only because... I'm not sure why yet.  Trying to figure out which runnable this might be.
So the Android thing is basically asserting with this stack:

20:17:10     INFO -   0  libxul.so!mozilla::dom::workers::WorkerRunnable::Run [WorkerRunnable.cpp:ed12e58edd01 : 365 + 0x2]
20:17:10     INFO -   1  libxul.so!mozilla::dom::workers::WorkerPrivate::ProcessAllControlRunnablesLocked [WorkerPrivate.cpp:ed12e58edd01 : 4966 + 0x7]
20:17:10     INFO -   2  libxul.so!mozilla::dom::workers::WorkerPrivate::ProcessAllControlRunnables [WorkerPrivate.h:ed12e58edd01 : 1378 + 0x5]
20:17:10     INFO -   3  libxul.so!mozilla::dom::workers::WorkerPrivate::InterruptCallback [WorkerPrivate.cpp:ed12e58edd01 : 4690 + 0x5]
20:17:10     INFO -   4  libxul.so!InvokeInterruptCallback [Runtime.cpp:ed12e58edd01 : 578 + 0x3]
20:17:10     INFO -   5  libxul.so!Interpret [jscntxt.h:ed12e58edd01 : 670 + 0x5]

That line number in WorkerRunnable::Run seems bogus to me; the log also has:

20:18:16     INFO -  03-01 20:16:50.837 F/MOZ_Assert( 2430): Assertion failure: !jsapi->HasException(), at /builds/slave/try-and-api-15-d-0000000000000/build/src/dom/workers/WorkerRunnable.cpp:374

That's an assert I added right _before_ calling WorkerRun() to assert that there is no pending exception on the JSContext.  It's failing, so we have an exception on the JSContext even before we've called WorkerRun().  Now what I'm not sure is how it got there, exactly...
We're in the interrupt callback ... perhaps we interrupted the JS engine at a point where there was a pending exception?
So just as an update, that is in fact what's going on, based on more try runs and asserts.  We're entering the interrupt callback with a pending exception on the JSContext.

I _think_ this is not an intended mode of operation for the interrupt callback and #jsapi seems to agree.  Doing another try run now with some SpiderMonkey changes to fix the obvious places where it does that right now: an exception thrown out of jitcode back to Interpret.  Which is something we could totally be hitting in the test_recursion test.

If that helps, I plan to beef up the asserts we have as follows:

1)  Assert in the interrupt callback entry that there is no exception on aCx.
2)  Assert in WorkerRunnable::Run that there is no exception on the JSContext before we call WorkerRun.
Depends on: 1252905
Flags: needinfo?(bzbarsky)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: