Closed Bug 1748333 Opened 2 years ago Closed 2 years ago

Assertion failure: !mTaskQueue->IsOnCurrentThread() (TaskQueue::AwaitIdle must not be called on itself), at /dom/media/webrtc/libwebrtcglue/TaskQueueWrapper.h:34

Categories

(Core :: WebRTC, defect, P2)

x86_64
Linux
defect

Tracking

()

RESOLVED DUPLICATE of bug 1754867
99 Branch
Tracking Status
firefox-esr91 --- unaffected
firefox96 --- wontfix
firefox97 --- wontfix
firefox98 --- wontfix
firefox99 --- wontfix

People

(Reporter: jkratzer, Assigned: bwc)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: regression, testcase, Whiteboard: [bugmon:bisected,confirmed])

Attachments

(3 files, 1 obsolete file)

Testcase found while fuzzing mozilla-central rev 1cb2015e6fbc (built with: --enable-debug --enable-fuzzing).

Testcase can be reproduced using the following commands:

$ pip install fuzzfetch grizzly-framework
$ python -m fuzzfetch --build 1cb2015e6fbc --debug --fuzzing -n firefox
$ python -m grizzly.replay ./firefox/firefox testcase.html
Assertion failure: !mTaskQueue->IsOnCurrentThread() (TaskQueue::AwaitIdle must not be called on itself), at /dom/media/webrtc/libwebrtcglue/TaskQueueWrapper.h:34

    ==1180106==ERROR: UndefinedBehaviorSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7f4315de2320 bp 0x7f4309280760 sp 0x7f4309280730 T1180264)
    ==1180106==The signal is caused by a WRITE memory access.
    ==1180106==Hint: address points to the zero page.
        #0 0x7f4315de2320 in mozilla::TaskQueueWrapper::Delete() /dom/media/webrtc/libwebrtcglue/TaskQueueWrapper.h:33:5
        #1 0x7f4315dde866 in operator() /third_party/libwebrtc/api/task_queue/task_queue_base.h:78:66
        #2 0x7f4315dde866 in operator() /dom/media/webrtc/libwebrtcglue/TaskQueueWrapper.h:107:31
        #3 0x7f4315dde866 in reset /builds/worker/workspace/obj-build/dist/include/mozilla/UniquePtr.h:305:7
        #4 0x7f4315dde866 in ~UniquePtr /builds/worker/workspace/obj-build/dist/include/mozilla/UniquePtr.h:253:18
        #5 0x7f4315dde866 in ~CallWorkerThread /dom/media/webrtc/libwebrtcglue/CallWorkerThread.h:42:31
        #6 0x7f4315dde866 in mozilla::CallWorkerThread::~CallWorkerThread() /dom/media/webrtc/libwebrtcglue/CallWorkerThread.h:42:31
        #7 0x7f4311c06d6c in mozilla::AbstractThread::Release() /xpcom/threads/AbstractThread.cpp:245:1
        #8 0x7f4315daaace in mozilla::CallWorkerThread::Release() /dom/media/webrtc/libwebrtcglue/CallWorkerThread.h:45:1
        #9 0x7f4311c17f97 in operator() /builds/worker/workspace/obj-build/dist/include/mozilla/UniquePtr.h:463:5
        #10 0x7f4311c17f97 in reset /builds/worker/workspace/obj-build/dist/include/mozilla/UniquePtr.h:305:7
        #11 0x7f4311c17f97 in ~UniquePtr /builds/worker/workspace/obj-build/dist/include/mozilla/UniquePtr.h:253:18
        #12 0x7f4311c17f97 in ~TaskGroupRunnable /builds/worker/workspace/obj-build/dist/include/mozilla/TaskDispatcher.h:207:9
        #13 0x7f4311c17f97 in mozilla::AutoTaskDispatcher::TaskGroupRunnable::~TaskGroupRunnable() /builds/worker/workspace/obj-build/dist/include/mozilla/TaskDispatcher.h:207:9
        #14 0x7f4311c238e7 in mozilla::Runnable::Release() /xpcom/threads/nsThreadUtils.cpp:63:1
        #15 0x7f4315de2151 in ~nsCOMPtr /builds/worker/workspace/obj-build/dist/include/nsCOMPtr.h:451:7
        #16 0x7f4315de2151 in ~ /dom/media/webrtc/libwebrtcglue/TaskQueueWrapper.h:64:9
        #17 0x7f4315de2151 in ~RunnableFunction /builds/worker/workspace/obj-build/dist/include/nsThreadUtils.h:522:7
        #18 0x7f4315de2151 in mozilla::detail::RunnableFunction<mozilla::TaskQueueWrapper::CreateTaskRunner(nsCOMPtr<nsIRunnable>)::'lambda'()>::~RunnableFunction() /builds/worker/workspace/obj-build/dist/include/nsThreadUtils.h:522:7
        #19 0x7f4311c238e7 in mozilla::Runnable::Release() /xpcom/threads/nsThreadUtils.cpp:63:1
        #20 0x7f4311c1f49c in assign_assuming_AddRef /builds/worker/workspace/obj-build/dist/include/nsCOMPtr.h:427:7
        #21 0x7f4311c1f49c in operator= /builds/worker/workspace/obj-build/dist/include/nsCOMPtr.h:696:5
        #22 0x7f4311c1f49c in mozilla::TaskQueue::Runner::Run() /xpcom/threads/TaskQueue.cpp:211:19
        #23 0x7f4311c3aa9b in nsThreadPool::Run() /xpcom/threads/nsThreadPool.cpp:305:14
        #24 0x7f4311c31119 in nsThread::ProcessNextEvent(bool, bool*) /xpcom/threads/nsThread.cpp:1177:16
        #25 0x7f4311c382ba in NS_ProcessNextEvent(nsIThread*, bool) /xpcom/threads/nsThreadUtils.cpp:467:10
        #26 0x7f43126dbd9b in mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) /ipc/glue/MessagePump.cpp:300:20
        #27 0x7f43125fa0b7 in MessageLoop::RunInternal() /ipc/chromium/src/base/message_loop.cc:331:10
        #28 0x7f43125f9fc2 in RunHandler /ipc/chromium/src/base/message_loop.cc:324:3
        #29 0x7f43125f9fc2 in MessageLoop::Run() /ipc/chromium/src/base/message_loop.cc:306:3
        #30 0x7f4311c2cd4b in nsThread::ThreadFunc(void*) /xpcom/threads/nsThread.cpp:391:10
        #31 0x7f432735f997 in _pt_root /nsprpub/pr/src/pthreads/ptthread.c:201:5
        #32 0x7f43280d3608 in start_thread /build/glibc-eX1tMB/glibc-2.31/nptl/pthread_create.c:477:8
        #33 0x7f4327c9b292 in __clone /build/glibc-eX1tMB/glibc-2.31/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:95
    
    UndefinedBehaviorSanitizer can not provide additional info.
    SUMMARY: UndefinedBehaviorSanitizer: SEGV /dom/media/webrtc/libwebrtcglue/TaskQueueWrapper.h:33:5 in mozilla::TaskQueueWrapper::Delete()
    ==1180106==ABORTING
Attached file Testcase

Bugmon Analysis
Verified bug as reproducible on mozilla-central 20220104034109-8bc2581b2c7b.
The bug appears to have been introduced in the following build range:

Start: 950a11eb5af31583c98e138a36380d1ecc0609f8 (20211130174121)
End: e545d6e273fe67f7cfbdb088804e26f3107567e8 (20211130195452)
Pushlog: https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=950a11eb5af31583c98e138a36380d1ecc0609f8&tochange=e545d6e273fe67f7cfbdb088804e26f3107567e8

Whiteboard: [bugmon:confirm] → [bugmon:bisected,confirmed]

Looks like there's a non-unit-test way of hitting this? Thoughts?

Flags: needinfo?(apehrson)

Grumble. I suppose if it happens to be released on itself we can assume any lifetime guarantees are already fulfilled, but if on another thread we must block as to not destroy something prematurely. I'll have to ponder this a bit...

Assignee: nobody → apehrson
Severity: -- → S3
Flags: needinfo?(apehrson)
Priority: -- → P2

:pehrsons, since this bug contains a bisection range, could you fill (if possible) the regressed_by field?
For more information, please visit auto_nag documentation.

Flags: needinfo?(apehrson)
Flags: needinfo?(apehrson)
Regressed by: 1741118

TaskQueueWrapper is used in two ways:

  • Through the ref-counted CallWorkerThread, which is, well, ref-counted, and
    passed around gecko as an AbstractThread. The lifetime of this task queue is
    managed by the ref-count and it may be deleted on any thread (bug 1748333...).
  • Through the TaskQueueFactory for libwebrtc's internal use. libwebrtc manages
    the lifetime of these task queues explicitly, and does not delete them on the
    task queue itself.

This patch adds a DeletionPolicy template parameter to TaskQueueWrapper to put
flag up front whether a TaskQueueWrapper will block shutdown or not.

For the former case above we use DeletionPolicy::NonBlocking and for the latter
DeletionPolicy::Blocking.

Has Regression Range: --- → yes
Pushed by na-g@nostrum.com:
https://hg.mozilla.org/integration/autoland/rev/df3acbea677e
Add DeletionPolicy to TaskQueueWrapper. r=ng
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 98 Branch

Bugmon Analysis
Bug marked as FIXED but still reproduces on mozilla-central 20220201093942-4bff0b888cd9.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---

Ok, since this is still happening, and this changeset is causing fairly frequent intermittent failures over in bug 1752959, we may want to back this out until we can determine how to fix 1752959.

Regressions: 1752959
Regressions: 1752963

(In reply to Byron Campen [:bwc] from comment #11)

Ok, since this is still happening, and this changeset is causing fairly frequent intermittent failures over in bug 1752959, we may want to back this out until we can determine how to fix 1752959.

Agreed

It looks like the testcase is now triggering the following assertion. If this is unrelated, please let me know and I'll open a new bug:

Assertion failure: get() (dereferencing a UniquePtr containing nullptr with ->), at /builds/worker/workspace/obj-build/dist/include/mozilla/UniquePtr.h:284

    #0 0x7fbecc28649e in operator-> /builds/worker/workspace/obj-build/dist/include/mozilla/UniquePtr.h:284:5
    #1 0x7fbecc28649e in mozilla::AutoTaskDispatcher::DispatchTasksFor(mozilla::AbstractThread*) /builds/worker/workspace/obj-build/dist/include/mozilla/TaskDispatcher.h:178:11
    #2 0x7fbecc28d901 in mozilla::TaskQueue::BeginShutdown() /builds/worker/checkouts/gecko/xpcom/threads/TaskQueue.cpp:157:20
    #3 0x7fbed04cd553 in mozilla::TaskQueueWrapper<(mozilla::DeletionPolicy)1>::Delete() /builds/worker/checkouts/gecko/dom/media/webrtc/libwebrtcglue/TaskQueueWrapper.h:40:19
    #4 0x7fbed0497666 in operator() /builds/worker/checkouts/gecko/third_party/libwebrtc/api/task_queue/task_queue_base.h:78:66
    #5 0x7fbed0497666 in operator() /builds/worker/checkouts/gecko/dom/media/webrtc/libwebrtcglue/TaskQueueWrapper.h:122:31
    #6 0x7fbed0497666 in reset /builds/worker/workspace/obj-build/dist/include/mozilla/UniquePtr.h:305:7
    #7 0x7fbed0497666 in ~UniquePtr /builds/worker/workspace/obj-build/dist/include/mozilla/UniquePtr.h:253:18
    #8 0x7fbed0497666 in ~CallWorkerThread /builds/worker/checkouts/gecko/dom/media/webrtc/libwebrtcglue/CallWorkerThread.h:44:31
    #9 0x7fbed0497666 in mozilla::CallWorkerThread::~CallWorkerThread() /builds/worker/checkouts/gecko/dom/media/webrtc/libwebrtcglue/CallWorkerThread.h:44:31
    #10 0x7fbecc2746dc in mozilla::AbstractThread::Release() /builds/worker/checkouts/gecko/xpcom/threads/AbstractThread.cpp:247:1
    #11 0x7fbed046393e in mozilla::CallWorkerThread::Release() /builds/worker/checkouts/gecko/dom/media/webrtc/libwebrtcglue/CallWorkerThread.h:47:1
    #12 0x7fbecc286799 in Release /builds/worker/workspace/obj-build/dist/include/mozilla/RefPtr.h:50:40
    #13 0x7fbecc286799 in Release /builds/worker/workspace/obj-build/dist/include/mozilla/RefPtr.h:381:36
    #14 0x7fbecc286799 in ~RefPtr /builds/worker/workspace/obj-build/dist/include/mozilla/RefPtr.h:81:7
    #15 0x7fbecc286799 in mozilla::AutoTaskDispatcher::DispatchTaskGroup(mozilla::UniquePtr<mozilla::AutoTaskDispatcher::PerThreadTaskGroup, mozilla::DefaultDelete<mozilla::AutoTaskDispatcher::PerThreadTaskGroup> >) /builds/worker/workspace/obj-build/dist/include/mozilla/TaskDispatcher.h:279:3
    #16 0x7fbecc285c29 in mozilla::AutoTaskDispatcher::~AutoTaskDispatcher() /builds/worker/workspace/obj-build/dist/include/mozilla/TaskDispatcher.h:123:7
    #17 0x7fbecc286f61 in reset /builds/worker/workspace/obj-build/dist/include/mozilla/Maybe.h:639:19
    #18 0x7fbecc286f61 in mozilla::XPCOMThreadWrapper::MaybeFireTailDispatcher() /builds/worker/checkouts/gecko/xpcom/threads/AbstractThread.cpp:196:23
    #19 0x7fbecc283d6c in AfterProcessNextEvent /builds/worker/checkouts/gecko/xpcom/threads/AbstractThread.cpp:134:5
    #20 0x7fbecc283d6c in non-virtual thunk to mozilla::XPCOMThreadWrapper::AfterProcessNextEvent(nsIThreadInternal*, bool) /builds/worker/checkouts/gecko/xpcom/threads/AbstractThread.cpp
    #21 0x7fbecc29fc18 in nsThread::ProcessNextEvent(bool, bool*) /builds/worker/checkouts/gecko/xpcom/threads/nsThread.cpp:1219:3
    #22 0x7fbecc2a6a1a in NS_ProcessNextEvent(nsIThread*, bool) /builds/worker/checkouts/gecko/xpcom/threads/nsThreadUtils.cpp:467:10
    #23 0x7fbeccd47006 in mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) /builds/worker/checkouts/gecko/ipc/glue/MessagePump.cpp:85:21
    #24 0x7fbeccc6bd67 in MessageLoop::RunInternal() /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:331:10
    #25 0x7fbeccc6bc72 in RunHandler /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:324:3
    #26 0x7fbeccc6bc72 in MessageLoop::Run() /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:306:3
    #27 0x7fbed0f41af8 in nsBaseAppShell::Run() /builds/worker/checkouts/gecko/widget/nsBaseAppShell.cpp:137:27
    #28 0x7fbed2fa3693 in XRE_RunAppShell() /builds/worker/checkouts/gecko/toolkit/xre/nsEmbedFunctions.cpp:878:20
    #29 0x7fbeccd47efa in mozilla::ipc::MessagePumpForChildProcess::Run(base::MessagePump::Delegate*) /builds/worker/checkouts/gecko/ipc/glue/MessagePump.cpp:235:9
    #30 0x7fbeccc6bd67 in MessageLoop::RunInternal() /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:331:10
    #31 0x7fbeccc6bc72 in RunHandler /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:324:3
    #32 0x7fbeccc6bc72 in MessageLoop::Run() /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:306:3
    #33 0x7fbed2fa2ccc in XRE_InitChildProcess(int, char**, XREChildData const*) /builds/worker/checkouts/gecko/toolkit/xre/nsEmbedFunctions.cpp:715:34
    #34 0x557136dfc029 in content_process_main /builds/worker/checkouts/gecko/browser/app/../../ipc/contentproc/plugin-container.cpp:57:28
    #35 0x557136dfc029 in main /builds/worker/checkouts/gecko/browser/app/nsBrowserApp.cpp:327:18

I'll take point on this, since pehrsons is on PTO.

Assignee: apehrson → docfaraday
Backout by mlaza@mozilla.com:
https://hg.mozilla.org/mozilla-central/rev/6a46314d2667
Backed out changeset df3acbea677e as requested by the dev. a=backout
Flags: needinfo?(docfaraday)

FWIW, I'm not seeing the crash (or any crash) with a locally built fuzzing asan build with D137577.

Attachment #9261783 - Attachment description: WIP: Bug 1748333: Wait for BeginShutdown to resolve before deleting. → Bug 1748333: Wait for BeginShutdown to resolve before deleting.
Attachment #9261783 - Attachment description: Bug 1748333: Wait for BeginShutdown to resolve before deleting. → Bug 1748333: Wait for BeginShutdown to resolve before deleting. r?ng

Maybe this is necessary, but unsure.

Depends on D137577

Attachment #9261251 - Attachment description: Bug 1748333 - Add DeletionPolicy to TaskQueueWrapper. r?ng! → Bug 1748333 - Add DeletionPolicy to TaskQueueWrapper. r=ng
Attachment #9261783 - Attachment description: Bug 1748333: Wait for BeginShutdown to resolve before deleting. r?ng → Bug 1748333: Wait for BeginShutdown to resolve before deleting, regardless of whether we're doing a blocking delete or not. r?ng
Attachment #9261984 - Attachment is obsolete: true

Try looks good.

Pushed by bcampen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/da663d369bc8
Add DeletionPolicy to TaskQueueWrapper. r=ng
https://hg.mozilla.org/integration/autoland/rev/75b6d96aaf65
Wait for BeginShutdown to resolve before deleting, regardless of whether we're doing a blocking delete or not. r=ng
Status: REOPENED → RESOLVED
Closed: 2 years ago2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 99 Branch

Bugmon Analysis
Bug marked as FIXED but still reproduces on mozilla-central 20220209095640-99073046b39c. If you believe this to be incorrect, please remove the bugmon keyword to prevent further analysis.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---

Can you post a stack for comment 28? There may be some different issue we're hitting.

Flags: needinfo?(jkratzer)

:bwc, the testcase does not reproduce very reliably. However, with MOZ_CHAOSMODE=0 - I see the same crash described in comment 13.

Flags: needinfo?(jkratzer)

I'm pretty sure that is a separate issue. Could you open a new bug and hook bugmon up to it?

Flags: needinfo?(jkratzer)

I did some analysis on the comment 13 issue in D137577 (and following comments). I thought maybe it got fixed by delaying the dtor there but apparently not?

The problem here is that we are going reentrant because this code first invokes the d'tor, and only when that is done is the Maybe actually cleared out:

https://searchfox.org/mozilla-central/rev/9f61d854547cedbde0773b2893e4f925352be3b3/xpcom/threads/AbstractThread.cpp#196

I guess one solution would be to move the value contained in mTailDispatcher into stack scope, and then allow it to go out of scope for destruction. This is, after all, the way AutoTaskDispatcher was supposed to work anyway; it was intended to be a stack class.

Another (more invasive) solution would be to modify Maybe::reset to ensure that it has cleared its internal value before invoking the d'tor.

Yet another solution might be to make this Maybe a unique_ptr instead (which guarantees that the unset occurs before invocation of the d'tor).

So far, not seeing new occurrences of bug 1752959 or bug 1752963.

See Also: → 1754867

Filed bug 1754867 for the crash in comment 13.

Flags: needinfo?(jkratzer)

Bugmon Analysis
Testcase crashes using the initial build (mozilla-central 20220103092929-1cb2015e6fbc) but not with tip (mozilla-central 20220225203949-ed7366af6fbf.)
The bug appears to have been fixed in the following build range:

Start: 5300e917c86d0b1e209a231bcbdbe389d0d0b1bc (20220219093323)
End: f6a5db4ee19659e21f1404d9dccb495fffb18bfb (20220219083300)
Pushlog: https://hg.mozilla.org/mozilla-unified/pushloghtml?fromchange=5300e917c86d0b1e209a231bcbdbe389d0d0b1bc&tochange=f6a5db4ee19659e21f1404d9dccb495fffb18bfb
Removing bugmon keyword as no further action possible. Please review the bug and re-add the keyword for further analysis.

Keywords: bugmon

Byron, is it safe to close this now?

Flags: needinfo?(docfaraday)
Status: REOPENED → RESOLVED
Closed: 2 years ago2 years ago
Flags: needinfo?(docfaraday)
Resolution: --- → DUPLICATE
No longer regressions: 1752963
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: