Closed Bug 1865774 Opened 8 months ago Closed 8 months ago

Intermittent TEST-UNEXPECTED-TIMEOUT /webmessaging/broadcastchannel/blobs.html | single tracking bug

Categories

(Core :: DOM: Core & HTML, defect, P2)

defect

Tracking

()

RESOLVED FIXED
122 Branch
Tracking Status
firefox-esr115 --- unaffected
firefox120 --- unaffected
firefox121 --- fixed
firefox122 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: edenchuang)

References

(Regression)

Details

(Keywords: intermittent-failure, intermittent-testcase, regression)

Attachments

(2 files)

Filed by: csabou [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=436999716&repo=mozilla-central
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/Dox1L-eZSjauZqHGuXIlIw/runs/0/artifacts/public/logs/live_backing.log


[task 2023-11-21T07:34:32.898Z] 07:34:32     INFO - TEST-START | /wasm/jsapi/memory/grow.any.worker.html
[task 2023-11-21T07:34:32.911Z] 07:34:32     INFO - Closing window 0bac6d09-aa82-4662-ae54-9a6976be77f6
[task 2023-11-21T07:34:32.996Z] 07:34:32     INFO - 
[task 2023-11-21T07:34:32.996Z] 07:34:32     INFO - TEST-PASS | /webmessaging/broadcastchannel/blobs.html | Blobs work on BroadcastChannel 
[task 2023-11-21T07:34:32.996Z] 07:34:32     INFO - TEST-UNEXPECTED-TIMEOUT | /webmessaging/broadcastchannel/blobs.html | Blobs work with workers on BroadcastChannel - Test timed out
[task 2023-11-21T07:34:32.996Z] 07:34:32     INFO - TEST-UNEXPECTED-TIMEOUT | /webmessaging/broadcastchannel/blobs.html | expected OK
[task 2023-11-21T07:34:32.996Z] 07:34:32     INFO - TEST-INFO took 10678ms
[task 2023-11-21T07:34:32.999Z] 07:34:32     INFO - PID 7080 | 1700552072998	Marionette	INFO	Stopped listening on port 49243
[task 2023-11-21T07:34:33.051Z] 07:34:33     INFO - ...................
[task 2023-11-21T07:34:33.051Z] 07:34:33     INFO - TEST-OK | /wasm/jsapi/memory/grow.any.worker.html | took 149ms
[task 2023-11-21T07:34:33.051Z] 07:34:33     INFO - TEST-START | /wasm/jsapi/memory/grow.any.html

Still failing after changes in https://bugzilla.mozilla.org/show_bug.cgi?id=1864459#c8 reached central.

Flags: needinfo?(echuang)

Found an edge case that the Worker is GCed immediately.

Worker creation and WorkerThreadPrimaryRunnable dispatching.

0:13.86 pid:15848 [Child 15856: Main Thread]: D/WorkerPrivate WorkerPrivate::WorkerPrivate [1286cab00]
0:13.86 pid:15848 [Child 15856: Main Thread]: D/WorkerPrivate WorkerPrivate::Start [1286cab00]
0:13.86 pid:15848 [Child 15856: Main Thread]: V/WorkerRunnable WorkerRunnable::DispatchInternal [1057ea9d0]
0:13.86 pid:15848 [Child 15856: Main Thread]: V/WorkerPrivate WorkerPrivate::DispatchLockHeld [1286cab00] runnable: 1057ea9d0
0:13.86 pid:15848 [Child 15856: Main Thread]: V/WorkerPrivate WorkerPrivate::DispatchLockHeld [1286cab00] runnable 1057ea9d0 is queued in mPreStartRunnables

worker.postMessage in the test. The WorkerRunnable would be saved in mPreStartRunnables.

0:13.86 pid:15848 postMessage channel to worker
0:13.86 pid:15848 [Child 15856: Main Thread]: V/WorkerRunnable WorkerRunnable::DispatchInternal [127922980]
0:13.87 pid:15848 [Child 15856: Main Thread]: V/WorkerPrivate WorkerPrivate::DispatchLockHeld [1286cab00] runnable: 127922980
0:13.87 pid:15848 [Child 15856: Main Thread]: V/WorkerPrivate WorkerPrivate::DispatchLockHeld [1286cab00] runnable 127922980 is queued in mPreStartRunnables
0:13.87 pid:15848 postMessage blob to worker
0:13.87 pid:15848 [Child 15856: Main Thread]: V/WorkerRunnable WorkerRunnable::DispatchInternal [127922a20]
0:13.87 pid:15848 [Child 15856: Main Thread]: V/WorkerPrivate WorkerPrivate::DispatchLockHeld [1286cab00] runnable: 127922a20
0:13.87 pid:15848 [Child 15856: Main Thread]: V/WorkerPrivate WorkerPrivate::DispatchLockHeld [1286cab00] runnable 127922a20 is queued in mPreStartRunnables

WorkerThreadPrimaryRunnable::Run()
We suppose that runnable in mPreStartRunnables would be moved to the thread event queue after WorkerPrivate::SetWorkerPrivateInWorkerThread(). So now WorkerPrivate::mThread->HasPendingEvents() should be true.

0:13.87 pid:15848 [Child 15856: DOM Worker]: D/WorkerPrivate WorkerPrivate::SetWorkerPrivateInWorkerThread [1286cab00]
0:13.87 pid:15848 [Child 15856: DOM Worker]: D/WorkerPrivate WorkerPrivate::AddWorkerRef [1286cab00] mStatus: 0, aFailStatus: (3)
0:13.87 pid:15848 [Child 15856: DOM Worker]: D/WorkerPrivate WorkerPrivate::DoRunLoop [1286cab00]
0:13.87 pid:15848 [Child 15856: DOM Worker]: V/WorkerPrivate WorkerPrivate::DoRunLoop [1286cab00] mStatus 1 before getting events to run
0:13.87 pid:15848 [Child 15856: DOM Worker]: D/WorkerPrivate Worker 1286cab00 scheduled periodic GC timer
0:13.87 pid:15848 [Child 15856: DOM Worker]: V/WorkerRunnable WorkerRunnable::Run [1057ea9d0]
0:13.87 pid:15848 [Child 15856: DOM Worker]: V/WorkerRef IPCWorkerRef::IPCWorkerRef [1057eb290] aWorkerPrivate: 1286cab00 aName: ClientManagerChild
0:13.87 pid:15848 [Child 15856: DOM Worker]: D/WorkerPrivate WorkerPrivate::AddWorkerRef [1286cab00] mStatus: 1, aFailStatus: (3)
0:13.87 pid:15848 [Child 15856: DOM Worker]: V/WorkerRef IPCWorkerRef::SetActorCount [1057eb290] aCount: 1
0:13.87 pid:15848 [Child 15856: DOM Worker]: V/WorkerPrivate WorkerPrivate::AdjustNonblockingCCBackgroundActors [1286cab00] (1/0)

In the meantime, the main thread checks if the worker can be GCed. And the answer is "Yes," but it should be "No."

0:13.87 pid:15848 [Child 15856: Main Thread]: V/WorkerPrivate WorkerPrivate::IsEligibleForCC [1286cab00]
0:13.87 pid:15848 [Child 15856: Main Thread]: V/WorkerPrivate mMainThreadEventTarget: empty
0:13.87 pid:15848 [Child 15856: Main Thread]: V/WorkerPrivate mMainThreadEventTargetForMessaging: empty
0:13.87 pid:15848 [Child 15856: Main Thread]: V/WorkerPrivate mMainThreadDebuggerEventTarget: empty
0:13.87 pid:15848 [Child 15856: Main Thread]: V/WorkerPrivate mCCFlagSaysEligible: true
0:13.87 pid:15848 [Child 15856: Main Thread]: V/WorkerPrivate hasShutdownTasks: false

We suppose hasPendingEvents should be "true" here, but it is not.

0:13.87 pid:15848 [Child 15856: Main Thread]: V/WorkerPrivate hasPendingEvents: false

Then GC the worker, so the status changes to "Canceling"

0:13.87 pid:15848 [Child 15856: Main Thread]: D/WorkerPrivate WorkerPrivate::DispatchControlRunnable [1286cab00] runnable 102babf70
0:13.87 pid:15848 [Child 15856: DOM Worker]: V/WorkerRunnable WorkerRunnable::Run [102babf70]
0:13.87 pid:15848 [Child 15856: DOM Worker]: D/WorkerPrivate WorkerPrivate::NotifyInternal [1286cab00] mStatus: 1, aStatus: 3
0:13.87 pid:15848 [Child 15856: DOM Worker]: D/WorkerPrivate WorkerPrivate::NotifyWorkerRefs [1286cab00] aStatus: 3
0:13.87 pid:15848 [Child 15856: DOM Worker]: D/WorkerPrivate WorkerPrivate::NotifyWorkerRefs [1286cab00] WorkerRefs(WeakWorkerRef 1057eab60)
0:13.87 pid:15848 [Child 15856: DOM Worker]: D/WorkerPrivate WorkerPrivate::RemoveWorkerRef [1286cab00] aWorkerRef: 1057eab60
0:13.87 pid:15848 [Child 15856: DOM Worker]: D/WorkerPrivate WorkerPrivate::NotifyWorkerRefs [1286cab00] WorkerRefs(ClientManagerChild 1057eb290)
0:13.87 pid:15848 [Child 15856: DOM Worker]: D/WorkerPrivate WorkerPrivate::NotifyWorkerRefs [1286cab00] WorkerRefs(WeakWorkerRef 102bf5f80)
0:13.87 pid:15848 [Child 15856: DOM Worker]: D/WorkerPrivate WorkerPrivate::RemoveWorkerRef [1286cab00] aWorkerRef: 102bf5f80
0:13.87 pid:15848 [Child 15856: DOM Worker]: D/WorkerPrivate WorkerPrivate::NotifyWorkerRefs [1286cab00] WorkerRefs(ScriptLoader 102bf60c0)
0:13.87 pid:15848 [Child 15856: DOM Worker]: D/WorkerPrivate WorkerPrivate::NotifyInternal [1286cab00] abort script

Assignee: nobody → echuang
Status: NEW → ASSIGNED
Flags: needinfo?(echuang)

Set release status flags based on info from the regressing bug 1836700

Severity: S4 → S3
Priority: P5 → P2
Pushed by echuang@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e97f959d1225
Allow WorkerThread::HasPendingEvents() can be accessed on the parent thread. r=asuth
Status: ASSIGNED → RESOLVED
Closed: 8 months ago
Resolution: --- → FIXED
Target Milestone: --- → 122 Branch

When double-checking things I realized we probably still have a gap with HasPendingEvents(). My original (unposted) proposal before HasPendingEvents was that we would set a flag on dispatch and clear it once WorkerPrivate::DoRunLoop had drained the events. At a high level, that was conceptually simple, but practically speaking it would end up much more messy and confusing, especially because of locking concerns like those that give rise to mOtherThreadsDispatchingViaEventTarget. So when I realized HasPendingEvents would do most of the work I proposed that, but...

HasPendingEvents() is great for letting us know as soon as a runnable is scheduled without us having to do anything extra. But it can start returning false as soon as ProcessNextEvent pulls the event off the queue to run it. We need to make sure that we set a flag that will cause IsEligibleForCC to return false that will at least cover:

  • From before calling any function that will pull an event off an event queue that we intend to process.
  • Until we've fully completed running the event.

The logic for this doesn't need to be particularly complex. In fact, I think one potentially quite simple thing we could do is.

mWorkerLoopIsIdle = true;
WaitForWorkerEvents();
mWorkerLoopIsIdle = false;

Note that I propose setting the flag outside of WaitForWorkerEvents because this way we only set the flag if we're in the top level loop, as the other 3 of 4 calls to the method would not be indicative of the worker being idle.

Status: RESOLVED → REOPENED
Flags: needinfo?(echuang)
Resolution: FIXED → ---
Pushed by echuang@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f4fae08660c1
Add WorkerPrivate::mWorkerLoopIsIdle to indicate the Worker is idle and could be GCed. r=asuth
Status: REOPENED → RESOLVED
Closed: 8 months ago8 months ago
Resolution: --- → FIXED

Comment on attachment 9364892 [details]
Bug 1865774 - Add WorkerPrivate::mWorkerLoopIsIdle to indicate the Worker is idle and could be GCed. r=asuth

Beta/Release Uplift Approval Request

  • User impact if declined: The worker might be GCed unexpectedly, and significant functionality of the GCed worker could be interrupted by the unexpected GC.
  • Is this code covered by automated tests?: Unknown
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): The patches are low risk since the patches just make the pending event checking can be done on the parent thread of the worker, which is the place to judge if the Worker should be GCed or not. And making GC a Worker can only happen when the worker is idle.
  • String changes made/needed: No
  • Is Android affected?: Yes
Flags: needinfo?(echuang)
Attachment #9364892 - Flags: approval-mozilla-beta?
Attachment #9364716 - Flags: approval-mozilla-beta?

Comment on attachment 9364716 [details]
Bug 1865774 - Allow WorkerThread::HasPendingEvents() can be accessed on the parent thread. r=asuth

Approved for 121.0b3

Attachment #9364716 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Comment on attachment 9364892 [details]
Bug 1865774 - Add WorkerPrivate::mWorkerLoopIsIdle to indicate the Worker is idle and could be GCed. r=asuth

Approved for 121.0b3

Attachment #9364892 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: