Close the tab will not kill the worker if there is a console.log() keep printing log to DevTools

RESOLVED FIXED in Firefox 66

Status

()

defect
P2
blocker
RESOLVED FIXED
5 years ago
8 months ago

People

(Reporter: timdream, Assigned: baku)

Tracking

Trunk
mozilla66
x86
macOS
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(e10s?, firefox66 fixed)

Details

Attachments

(1 attachment, 3 obsolete attachments)

Posted file cpu-burner.html (obsolete) —
STR:

1. Open test case, a worker will start running
2. Open devtool and verify numbers being printed on the console
3. Verify the CPU usage of Firefox is high with |top|
4. Open a new tab
5. Close the tab with test case
6. Wait for a few second and observe |top|

Expected:

1. Firefox stop eating CPU since Worker should be removed.

Actual:

1. Firefox keep eating CPU and your laptop start burning your hands.
Summary: Close the tab will not kill the worker if there is a console.log() in it → Close the tab will not kill the worker if there is a console.log() keep printing log to DevTools
ochameau ask me to try to replace |console.log()| with |dump()| and see if the issue is still there. It's not. So this looks more likely a DevTools issue according to him.
Component: DOM: Workers → Developer Tools: Console
Product: Core → Firefox
No, this belongs in the DOM.  Andrea can you investigate here?
Component: Developer Tools: Console → DOM: Workers
Product: Firefox → Core
Actually adding Andreas ... see comment 2
Flags: needinfo?(amarchesini)
working on it.
Assignee: nobody → amarchesini
Flags: needinfo?(amarchesini)
Posted patch feature.patch (obsolete) — Splinter Review
Attachment #8422377 - Flags: review?(khuey)
Why is this necessary?  Why isn't JS interrupted when the worker wants to shut down?:
Flags: needinfo?(amarchesini)
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #6)
> Why is this necessary?  Why isn't JS interrupted when the worker wants to
> shut down?:

The reason why I wrote this is because a worker is not kill if there is a runnable running. When console.log() is called in workers we run this:

nsRefPtr<ConsoleCallDataRunnable> runnable =
  new ConsoleCallDataRunnable(callData);
runnable->Dispatch();

This keeps the worker and the JS alive. This patch fixes the problem for the Console, but maybe bent or you can come out with some better global idea.
Flags: needinfo?(amarchesini)
Comment on attachment 8422377 [details] [diff] [review]
feature.patch

This is just wallpaper over a much more serious issue.
Attachment #8422377 - Flags: review?(khuey) → review-
See Also: → 1145866
So I never got around to fixing this ...

One of the tricky problems worker shutdown has to deal with is what to do with JS that is not yieldin.  We assume that we're doing one of four things:

1. We're running some chrome JS ctypes call, such as an epoll.  We don't attempt to handle this case, and will simply block shutdown until it completes.
2. We're running an infinite loop solely within JS.
3. We're running an infinite loop in JS that is making "short-lived" calls into the DOM, where "short-lived" means something that does not spin any event queues.
4. We're currently running a sync queue.

When we want to shutdown a worker, we dispatch a NotifyRunnable, which is a control runnable.  That pokes the operation callback, which interrupts running JS to run the control queue.  We then return false from the operation callback which aborts the running script.

This handles case 2, and it handles case 3 as soon as we return from the DOM call into JS (the operation callback will trigger immediately and we'll run the NotifyRunnable).  It does not handle case 4.  We used to deal with this by propagating the false return value from RunCurrentSyncLoop to the caller (of the JSNative entry point into the DOM) which would trigger an uncatchable exception.  Some of that was lost when we moved to WebIDL.

I believe the Console used to make sync calls to the main thread, and doesn't anymore, so this might just be fixed.  But some other sites need some work.  e.g. http://searchfox.org/mozilla-central/rev/d9a04f71630ce4203ff0a5e26722723045d035b5/dom/workers/WorkerRunnable.cpp#593 looks correct, but http://searchfox.org/mozilla-central/rev/d9a04f71630ce4203ff0a5e26722723045d035b5/dom/xhr/XMLHttpRequestWorker.cpp#1827 does not, and neither does http://searchfox.org/mozilla-central/rev/d9a04f71630ce4203ff0a5e26722723045d035b5/dom/workers/ScriptLoader.cpp#2112.

I'll needinfo bz since he touched some of this last, and can probably pawn it off on baku or bkelly if needed.
Flags: needinfo?(bzbarsky)
I've been trying to see if the bug reproduces in the latest Nightly and the problem is getting worse with e10s. I am unable to close the tab once it starts running. Neither I can Cmd+Q the Nighty app afterwards. Both the chrome and content process gets too busy, making switching tab hard.
Severity: normal → blocker
tracking-e10s: --- → ?
But yeah, in non-e10s the bug is fixed, considering the repro step in comment 0.
(In reply to Tim Guan-tin Chien [:timdream] (please needinfo) from comment #11)
> Neither I can Cmd+Q the Nighty app afterwards.

Cmd+Q works after a very long wait, console output the following:

$ mach run https://bugzilla.mozilla.org/attachment.cgi?id=8415079
 0:00.25 /Users/timdream/Repositories/gecko/gecko-dev/objdir-artifact/dist/Nightly.app/Contents/MacOS/firefox https://bugzilla.mozilla.org/attachment.cgi?id=8415079 -no-remote -foreground -profile /Users/timdream/Repositories/gecko/gecko-dev/objdir-artifact/tmp/scratch_user
WARNING: At least one completion condition is taking too long to complete. Conditions: [{"name":"SessionStore: flushing all windows","state":{"total":1,"current":0},"filename":"resource:///modules/sessionstore/SessionStore.jsm","lineNumber":1465,"stack":["resource:///modules/sessionstore/SessionStore.jsm:ssi_onQuitApplicationGranted:1465","resource:///modules/sessionstore/SessionStore.jsm:ssi_observe:648","chrome://global/content/bindings/remote-browser.xml:permitUnload:356","chrome://browser/content/browser.js:CanCloseWindow:6173","chrome://browser/content/browser.js:WindowIsClosing:6193","chrome://browser/content/browser.xul:onclose:1"]}] Barrier: quit-application-granted
Posted patch worker.patch (obsolete) — Splinter Review
We still don't support this particular case because we end up creating a loop of scheduling of GarbageCollectRunnable. We don't have onclose event anymore, so when we shutdown workers we can immediately stop scheduling GC.
Assignee: nobody → amarchesini
Attachment #8415079 - Attachment is obsolete: true
Attachment #8422377 - Attachment is obsolete: true
Flags: needinfo?(bzbarsky)
Attachment #8787452 - Flags: review?(bkelly)
Comment on attachment 8787452 [details] [diff] [review]
worker.patch

Review of attachment 8787452 [details] [diff] [review]:
-----------------------------------------------------------------

Can't we still be running js in the Closing state?  We don't abort in that case.  I don't think this test case exercises that path, though.

Also, shouldn't we get this for free once we collapse Terminating/Canceling/Killing into a single state?
Attachment #8787452 - Flags: review?(bkelly) → review-
Also, I think we are mostly handling this case today now as well.  For example:

1) Open fresh dev-edition browser
2) open example.com tab
3) Open comment 0 attachment in tab
4) Observe CPU usage increases
5) Close comment 0 attachment tab (this will take a while)
6) Open about:memory tab
7) Minimize memory
8) Observe CPU usage drops

The memory minimization causes us to enter the Terminating state which aborts the script.  This seems to cleanup correctly.

This is a bigger problem in e10s because we don't get GC's scheduled as often since a lot of chrome script lives in the parent process.

The jankiness seems induced by console.log() flooding the main thread with runnables.  I think perhaps we should instead write a bug on console to not do this.  For example, it could allow at most one outstanding main thread Runnable per worker thread.  Any queued up console reports get sent in the next main thread runnable.

Andrea, what do you think?
Flags: needinfo?(amarchesini)
Attachment #8415079 - Attachment is obsolete: false
Andrea, I think I might have seen the GC loop thing as well.  I might not have seen previously since it will probably be timing dependent based on when the GC timers fire.
No longer depends on: 1300118
So I'm trying to understand comment 14. What is the loop here?
I don't remember all the details, but basically we were in state Closing or Terminating, but could never progress to Killing.  GC control runnables kept getting dispatched before we could progress to Killing.  Because we only cancel GC timers in Killing this could prevent shutdown forever.
I can no longer reproduce comment 0 on Nightly (other than the fact console blocks the chrome process). If nothing else is actionable here we should close this as WFM.
:Baku, can you verify Tim's last comment and then close this bug accordingly?
Priority: -- → P2
Status: NEW → RESOLVED
Closed: Last year
Resolution: --- → WORKSFORME
Flags: needinfo?(amarchesini)
Yes, this bug is fixed. I confirm.
Blake, I don't know how familiar you are with the worker shutdown but this is what happens:

1. When a worker is terminated, we call: Notify(Canceling) on the parent thread.
2. This triggers a NotifyInternal(Canceling) call on the worker thread via a ControlRunnable.
3. Dispatching a ControlRunnable triggers a JS interruption.
4. If we are in a sync loop (xhr? url CTOR?), it can happen that we process the control runnable queue there (maybe we could refactoring this part, but it's not related to this bug).
5. The worker is set to 'canceling' state. At this point no JS execution should happen.
6. When the interruption callback is executed, there are no control runnables in the queue. This means that we don't return 'false' and JS is still executed.

You can reproduce this issue with something like: |while (true) { fetch(something); }|

My patch checks the status in the intercept callback. Plus it avoids processing normal runnables in sync loop when the worker is terminated.

Follow ups will be:

1. Ignore the return value for Control runnables. We can simply check WorkerPrivate::mStatus
2. Check if we can avoid processing all the control runnables in multiple places and just process them in the intercept callback.
Attachment #8415079 - Attachment is obsolete: true
Attachment #8787452 - Attachment is obsolete: true
Attachment #9031104 - Flags: review?(mrbkap)
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Comment on attachment 9031104 [details] [diff] [review]
worker_shutdown.patch

Review of attachment 9031104 [details] [diff] [review]:
-----------------------------------------------------------------

I think I understand this enough to stamp it.
Attachment #9031104 - Flags: review?(mrbkap) → review+
Pushed by amarchesini@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/2d4a6ccb4811
Always terminate workers when in canceling state, r=mrbkap
https://hg.mozilla.org/mozilla-central/rev/2d4a6ccb4811
Status: REOPENED → RESOLVED
Closed: Last year8 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla66
You need to log in before you can comment on or make changes to this bug.