Closed Bug 1529163 Opened 10 months ago Closed 5 months ago

Messages sent to workers can be lost when the worker closes

Categories

(DevTools :: Debugger, defect, P2)

defect

Tracking

(firefox70 fixed)

RESOLVED FIXED
Firefox 70
Tracking Status
firefox70 --- fixed

People

(Reporter: bhackett, Assigned: bhackett)

References

(Blocks 2 open bugs)

Details

(Whiteboard: [debugger-mvp])

Attachments

(2 files)

Attached patch patchSplinter Review

Recent breakpoint changes seem to have exposed a problem with the protocol used when communicating with a worker thread. The series of steps is as follows:

  • The client sends a request to an actor associated with a worker.
  • This request goes through the normal procedure for communicating with the worker, ending with the this._dbg.postMessage() call in worker-transport.js
  • Before the worker can send a response to the request, it closes.
  • No response or error is ever sent to the client.

This is causing the issue below, which has STR:

https://github.com/firefox-devtools/debugger.html/issues/7989

The attached patch is an attempt to fix this problem, by keeping track of messages sent to the worker which haven't received a response yet. When the worker closes, error responses are sent for each of those messages. This does work, but I don't know how correct it is.

Attachment #9045123 - Flags: data-review?(poirot.alex)
Attachment #9045123 - Flags: data-review?(poirot.alex) → review?(poirot.alex)
Comment on attachment 9045123 [details] [diff] [review]
patch

Review of attachment 9045123 [details] [diff] [review]:
-----------------------------------------------------------------

::: devtools/client/debugger/new/src/client/firefox/commands.js
@@ +182,5 @@
> +    try {
> +      await thread.setBreakpoint(location, options);
> +    } catch (e) {
> +      // Worker threads might close down while we are operating on their breakpoints.
> +    }

Such try/catch may hide you in some other context some serious breakage.
We should at least keep logging the error.

Is it a very rare race condition? Or does the workerClients list do not get updated at all when the worker is closed? Or not updated quickly enough?

I don't know if it may help to listen for workerTargetFront.once("close", ()=>{}) in order to remove worker from the list more quickly?
In theory, the Target front should be destroyed when the worker is destroyed.
C++ API should call this nsIWorkerDebugger.onClose listener method:
https://searchfox.org/mozilla-central/rev/4587d146681b16ff9878a6fdcba53b04f76abe1d/devtools/server/actors/targets/worker.js#136
Itself should communicate to the frontend that the target close is to be destroyed. This particular WorkerTargetFront is in the parent process, so that it should still be able to run and communicate to the client after the worker is destroyed.
And then, in theory, all the child client and fronts, including the thread client should be destroyed. I know we do have something to destroy pending Fronts requests:
https://searchfox.org/mozilla-central/source/devtools/shared/protocol.js#1292-1300
But I'm not sure if the old client, like ThreadClient are having a similar feature. Yulia is now looking into bug 1494796, which is aiming to convert it to protocol.js front.
This is this code, which looks like something that should impact the old clients:
https://searchfox.org/mozilla-central/rev/4587d146681b16ff9878a6fdcba53b04f76abe1d/devtools/shared/client/debugger-client.js#669-676
It could be interesting ensuring this works as expected for workers.
Attachment #9045123 - Flags: review?(poirot.alex)

(In reply to Alexandre Poirot [:ochameau] from comment #1)

Is it a very rare race condition? Or does the workerClients list do not get
updated at all when the worker is closed? Or not updated quickly enough?

The STR above happen consistently. As far as I know the worker clients won't be updated until after the workers in the server have closed, but regardless we should be able to send requests to the worker thread at any time and get back either a response or an error.

I don't know if it may help to listen for workerTargetFront.once("close",
()=>{}) in order to remove worker from the list more quickly?
In theory, the Target front should be destroyed when the worker is destroyed.
C++ API should call this nsIWorkerDebugger.onClose listener method:
https://searchfox.org/mozilla-central/rev/
4587d146681b16ff9878a6fdcba53b04f76abe1d/devtools/server/actors/targets/
worker.js#136
Itself should communicate to the frontend that the target close is to be
destroyed. This particular WorkerTargetFront is in the parent process, so
that it should still be able to run and communicate to the client after the
worker is destroyed.
And then, in theory, all the child client and fronts, including the thread
client should be destroyed. I know we do have something to destroy pending
Fronts requests:
https://searchfox.org/mozilla-central/source/devtools/shared/protocol.
js#1292-1300
But I'm not sure if the old client, like ThreadClient are having a similar
feature. Yulia is now looking into bug 1494796, which is aiming to convert
it to protocol.js front.
This is this code, which looks like something that should impact the old
clients:
https://searchfox.org/mozilla-central/rev/
4587d146681b16ff9878a6fdcba53b04f76abe1d/devtools/shared/client/debugger-
client.js#669-676
It could be interesting ensuring this works as expected for workers.

The destroy() method in Front looks like it would work here, since it will reject all outstanding requests on the front. Is there an ETA for bug 1494796? Right now this is being worked around (https://github.com/firefox-devtools/debugger.html/pull/7996) in a way that can affect the UX where breakpoints in workers might not be hit as expected.

Priority: -- → P2
Depends on: 1494796

Brian, what is the status of this?

Once bug 1494796 is fixed, this issue should be fixed as well and the client can make sure breakpoints have been added/removed from workers before resuming execution.

Blocks: dbg-70
Whiteboard: [debugger-mvp]

Now that bug 1494796 has landed, this should be fixed. The patch I just posted waits on all threads when sending messages to the server. With this patch applied I can't reproduce the STR in comment 0.

Status: NEW → ASSIGNED
Pushed by bhackett@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/3be4bceb9918
Wait for all threads in debugger client, r=jlast.
Status: ASSIGNED → RESOLVED
Closed: 5 months ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 70
You need to log in before you can comment on or make changes to this bug.