Closed Bug 1321739 Opened 8 years ago Closed 4 years ago

Copy Image crash (dead beef, it's what's for dinner)

Categories

(Core :: Graphics, defect, P3)

All
macOS
defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox51 --- affected
firefox52 --- affected
firefox53 --- affected
firefox54 --- affected

People

(Reporter: Dolske, Unassigned)

Details

(Whiteboard: [gfx-noted])

Crash Data

1) Load https://i.imgur.com/9qtVISH.jpg
2) Savor the deliciousness.
3) Right click, Copy Image
4) Crash. >_< (content process)

This is easily reproducible for me. OS X Nightly, 11/30.

bp-9bfddbb3-4765-4ecb-8945-1151b2161202
bp-66f741bc-3fab-4b29-a162-da7a32161202
bp-931c2eaf-f2e1-461c-a70a-db0672161202

Crash-stats says the signature is "IPCError-browser | (msgtype=0xFFFB) Payload error: message could not be deserialized". I don't quite understand why it's saying that, though, since the actual crash stack doesn't have anything to do with IPC? Is that what the parent process sees?

0) _platform_memmove$VARIANT$Haswell
1) nsContentUtils::GetSurfaceData()
2) nsContentUtils::TransferableToIPCTransferable()
3) nsClipboardProxy::SetData()
4) nsCopySupport::ImageCopy()
...

Note that the example image is 7952 × 5304, so the bitmap that's being placed into the clipboard is going to be quite large.

Bug 1272018 touched this code a few months ago.
Also, one of my crashes was bp-7c669035-1637-4883-9bd0-9a5132161202, which seems to be in the middle of GC?!
Has STR: --- → yes
Priority: -- → P3
Whiteboard: [gfx-noted]
I am on Linux, and the content process shows the following error message in stdout/stderr:

nouveau: kernel rejected pushbuf: Cannot allocate memory
nouveau: ch9: krec 0 pushes 1 bufs 15 relocs 0
nouveau: ch9: buf 00000000 00000003 00000004 00000004 00000000
nouveau: ch9: buf 00000001 00000012 00000002 00000002 00000000
nouveau: ch9: buf 00000002 00000007 00000002 00000002 00000000
nouveau: ch9: buf 00000003 00000008 00000002 00000002 00000002
nouveau: ch9: buf 00000004 0000000b 00000002 00000002 00000000
nouveau: ch9: buf 00000005 0000000a 00000002 00000002 00000002
nouveau: ch9: buf 00000006 00000006 00000004 00000000 00000004
nouveau: ch9: buf 00000007 0000001d 00000002 00000000 00000002
nouveau: ch9: buf 00000008 00000015 00000002 00000002 00000000
nouveau: ch9: buf 00000009 00000011 00000002 00000002 00000000
nouveau: ch9: buf 0000000a 0000001e 00000002 00000002 00000000
nouveau: ch9: buf 0000000b 00000022 00000002 00000002 00000000
nouveau: ch9: buf 0000000c 00000024 00000002 00000002 00000000
nouveau: ch9: buf 0000000d 00000014 00000002 00000002 00000000
nouveau: ch9: buf 0000000e 00000026 00000002 00000002 00000000
nouveau: ch9: psh 00000000 00000038b4 0000009494
nouveau:        0x20040360

It even crashed my X once in my test when the content process crashes, but it's hard to reproduce on my workstation when I tried to catch it using gdb.
(In reply to Cervantes Yu [:cyu] [:cervantes] from comment #2)
> I am on Linux, and the content process shows the following error message in
> stdout/stderr:

The error was output in the *parent* process.
I reproduced this on a Mac. The child process is killed by the parent. The child process successfully created the shared memory region and shared the handle to the parent. The parent process read the handle and tried to map the shared memory region, but failed in the mach_vm_map() call in
https://dxr.mozilla.org/mozilla-central/rev/8103c612b79c2587ea4ca1b0a9f9f82db4b185b8/ipc/glue/SharedMemoryBasic_mach.mm#577
with mach_error_t == 4 "(os/kern) invalid argument". I am still investigating why we got this error on large images.
It turns out that there is a hidden, 128 MiB limit of shared memory region on Mac. The content process successfully allocates the memory region, makes a memory entry and sends the mach port to the parent process. The parent process successfully receives the port, but fails when it calls mach_vm_map() to map a shared memory region > 128 MiB with KERN_INVALID_ARG.

To really fix this bug, we need to send multiple smaller shared memory regions instead of a large contiguous one.
I can also ran into this on Fx 51.0.1, Fx52 beta 9, latest DevEdition 53.0a2 and latest Nightly 54.0a1 on macOS 10.12.3, Windows and Linux are not affected. (marking accordingly)
OS: Unspecified → Mac OS X
Hardware: Unspecified → All
(In reply to Cervantes Yu [:cyu] [:cervantes] from comment #5)

> To really fix this bug, we need to send multiple smaller shared memory
> regions instead of a large contiguous one.

I don't know how hard that is, but we could also consider a bandaid to just fail the attempt when we know the request would exceed the limit size.

Closing because no crashes reported for 12 weeks.

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.