Open Bug 1704526 Opened 4 years ago Updated 7 months ago

Crash when running with WGPU_TRACE environment

Tracking

()

Status:

NEW

People

(Reporter: kvark, Unassigned)

References

(Blocks 1 open bug)

Details

Attachments

(2 files)

trace-log.txt 4 years ago Dzmitry Malyshau [:kvark] 839.08 KB, text/plain		Details
testPerfWebGPU.html 4 years ago Dzmitry Malyshau [:kvark] 15.63 KB, text/html		Details

Dzmitry Malyshau [:kvark]

Reporter

Description

•

4 years ago

Attached file trace-log.txt — Details

Having WGPU_TRACE working is essential to debugging issues. It appears to be crashing now, rather mysteriously, on a specific page (attached). Logs are also attached.

Trying to catch it in either the child, or the GPU process, doesn't yield anything. The crash seems to originate from the IPC thread, and the first relevant log messages are:

[Parent 664563, IPC I/O Parent] WARNING: Message needs unreceived descriptors channel:7f1e574d8d00 message-type:65531 header()->num_fds:1 num_fds:0 fds_i:0: file /mnt/code/firefox/_webgpu/ipc/chromium/src/chrome/common/ipc_channel_posix.cc:507

###!!! [Child][MessageChannel] Error: (msgtype=0xB40001,name=PWebGPU::Msg_DeviceAction) Channel error: cannot send/recv

[2021-04-10T17:30:56Z INFO wgpu_core::device] Created buffer Valid((2173, 1, Vulkan)) with BufferDescriptor { label: None, size: 576, usage: VERTEX, mapped_at_creation: true }
[Child 664650, IPC I/O Child] WARNING: FileDescriptorSet destroyed with unconsumed descriptors: file /mnt/code/firefox/_webgpu/ipc/chromium/src/chrome/common/file_descriptor_set_posix.cc:19

Dzmitry Malyshau [:kvark]

Reporter

Comment 1

•

4 years ago

Attached file testPerfWebGPU.html — Details

attaching the test case

Dzmitry Malyshau [:kvark]

Reporter

Comment 2

•

4 years ago

I used this function to decode the message type 65531 = SHMEM_CREATED_MESSAGE.

Dzmitry Malyshau [:kvark]

Reporter

Comment 3

•

4 years ago

This isn't a crash, it seems. Adding an assertion there gives me a proper call stack:

Assertion failure: false, at /mnt/code/firefox/_webgpu/ipc/chromium/src/chrome/common/ipc_channel_posix.cc:498
#01: IPC::Channel::ChannelImpl::OnFileCanReadWithoutBlocking(int) (/mnt/code/firefox/_webgpu/ipc/chromium/src/chrome/common/ipc_channel_posix.cc:828)
#02: base::MessagePumpLibevent::OnLibeventNotification(int, short, void*) (/mnt/code/firefox/_webgpu/ipc/chromium/src/base/message_pump_libevent.cc:251)
#03: event_process_active_single_queue (/mnt/code/firefox/_webgpu/ipc/chromium/src/third_party/libevent/event.c:1639)
#04: event_base_loop (/mnt/code/firefox/_webgpu/ipc/chromium/src/third_party/libevent/event.c:1961)
#05: base::MessagePumpLibevent::Run(base::MessagePump::Delegate*) (/mnt/code/firefox/_webgpu/ipc/chromium/src/base/message_pump_libevent.cc:0)
#06: MessageLoop::RunInternal() (/mnt/code/firefox/_webgpu/ipc/chromium/src/base/message_loop.cc:0)
#07: MessageLoop::Run() (/mnt/code/firefox/_webgpu/ipc/chromium/src/base/message_loop.cc:311)
#08: base::Thread::ThreadMain() (/mnt/code/firefox/_webgpu/ipc/chromium/src/base/thread.cc:194)
#09: ThreadFunc(void*) (/mnt/code/firefox/_webgpu/ipc/chromium/src/base/platform_thread_posix.cc:41)

Dzmitry Malyshau [:kvark]

Reporter

Comment 4

•

4 years ago

Here is a few more details of the story:

the application creates 3000 * N resources at once, many end up with an associated Shmem on our side
we aren't freeing them nearly as fast as we are asked to create them
when WGPU_TRACE is enabled, we are also writing down many files in the GPU/parent process. ~5000 of them. All of them are written with std::fs in Rust, and all of these handles are closed.

Perhaps, we are running into some kind of file descriptor exhaustion?

Jed Davis [:jld] ⟨⏰|UTC-7⟩ ⟦he/him⟧

Comment 5

•

4 years ago

(In reply to Dzmitry Malyshau [:kvark] from comment #4)

Here is a few more details of the story:

the application creates 3000 * N resources at once, many end up with an associated Shmem on our side

we aren't freeing them nearly as fast as we are asked to create them

when WGPU_TRACE is enabled, we are also writing down many files in the GPU/parent process. ~5000 of them. All of them are written with std::fs in Rust, and all of these handles are closed.

Perhaps, we are running into some kind of file descriptor exhaustion?

That is likely to be the problem. We have a limit of 4096 file descriptors per process, and the last time this was investigated, we found that some Linux distros have a hard limit of 4k so we can't raise it further in that case.

This probably would have been more immediately obvious if not for a lack of checks for fd exhaustion in some relevant places.

Shmem will automatically close its file descriptor after it's been mapped in each of the two processes involved, but if a large number are created at once, we could have a large peak number of fds. (Note that Linux also has a limit of 64k virtual memory areas due to annoying historical issues, and bug 1700687 reveals that we're already reaching ⅓ of that limit.)

It's going to be necessary to merge these resources into fewer shared memory segments somehow.

Jed Davis [:jld] ⟨⏰|UTC-7⟩ ⟦he/him⟧

Updated

•

4 years ago

Comment 6

•

3 years ago

I no longer think this should be blocking MVP. It doesn't affect users.

Blocks: webgpu-v1
No longer blocks: webgpu-mvp

Erich Gubler [:ErichDonGubler]

Updated

•

1 year ago

Priority: -- → P3

Brad Werth [:bradwerth]

Updated

•

1 year ago

Blocks: webgpu-phase-2
No longer blocks: webgpu-v1

Jim Blandy :jimb

Updated

•

7 months ago

Blocks: webgpu-triage

Jim Blandy :jimb

Updated

•

7 months ago

No longer blocks: webgpu-triage

Priority: P3 → P2

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Crash when running with WGPU_TRACE environment

Categories

(Core :: Graphics: WebGPU, defect, P2)

Tracking

()

People

(Reporter: kvark, Unassigned)

References

(Blocks 1 open bug)

Details

Crash Data

Security

(public)

User Story

Attachments

(2 files)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Updated

Comment 6

Updated

Updated

Updated

Updated

Attachment

General

Description

File Name

Content Type