Figure out how to avoid bug 1596826
Categories
(Core :: Widget: Cocoa, enhancement)
Tracking
()
People
(Reporter: jrmuizel, Assigned: jrmuizel)
Details
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Comment 1•4 years ago
|
||
So I was able to trigger this crash more often by expanding the customizeui test suite. It also looks like it's specifically https://bugzilla.mozilla.org/attachment.cgi?id=9105894 that causes the crashes to start happening.
Assignee | ||
Comment 2•4 years ago
|
||
I tried disabling layer recycling and still got the crash: https://treeherder.mozilla.org/#/jobs?repo=try&revision=c7a80eb5ff7aefe8e3bd3cb6fcb6ad87964a9f3f
Assignee | ||
Comment 3•4 years ago
|
||
I took a closer look at where the most common crash in [NSView buildLayerTreeWithOwnLayerRequirement:someAncestorWantsLayer:] is happening.
Crash reason: EXC_BAD_ACCESS / KERN_INVALID_ADDRESS
Crash address: 0xffffffffe5e5e5f8
Process uptime: 744 seconds
Thread 0 (crashed)
0 libobjc.A.dylib!objc_msgSend + 0x1d
rax = 0x00007fff429e6503 rdx = 0x0000000000000002
rcx = 0x00000001265ad430 rbx = 0x00000001265ad430
rsi = 0x00007fff40ac4929 rdi = 0x00000001265ad430
rbp = 0x00007ffee90f4c40 rsp = 0x00007ffee90f4c28
r8 = 0x000000012350faa0 r9 = 0x0000000106c00110
r10 = 0x000065e5e5e5e5e0 r11 = 0x00007fff40ac4929
r12 = 0x0000000000000003 r13 = 0x0000000000000004
r14 = 0x0000000000000000 r15 = 0x00007fff40b1c77d
rip = 0x00007fff6d00a69d
Found by: given as instruction pointer in context
1 AppKit!-[NSView buildLayerTreeWithOwnLayerRequirement:someAncestorWantsLayer:] + 0xbed
rbp = 0x00007ffee90f50d0 rsp = 0x00007ffee90f4c50
rip = 0x00007fff4004963b
Found by: previous frame's frame pointer
2 AppKit!-[NSView buildLayerTreeWithOwnLayerRequirement:someAncestorWantsLayer:] + 0x4c9
rbp = 0x00007ffee90f5560 rsp = 0x00007ffee90f50e0
rip = 0x00007fff40048f17
Found by: previous frame's frame pointer
3 AppKit!-[NSView buildLayerTreeWithOwnLayerRequirement:someAncestorWantsLayer:] + 0x4c9
rbp = 0x00007ffee90f59f0 rsp = 0x00007ffee90f5570
rip = 0x00007fff40048f17
Found by: previous frame's frame pointer
4 AppKit!-[NSView buildLayerTreeWithOwnLayerRequirement:someAncestorWantsLayer:] + 0x4c9
rbp = 0x00007ffee90f5e80 rsp = 0x00007ffee90f5a00
rip = 0x00007fff40048f17
The crash happens with a call to _objc_enumerationMutation which tail calls the enumeration handler which is likely ___NSFastEnumerationMutationHandle. ___NSFastEnumerationMutationHandle calls _objc_msgSend early but it's very weird that ___NSFastEnumerationMutationHandle does not appear on the stack.
Assignee | ||
Comment 4•4 years ago
|
||
Some more information on what we're enumerating:
We call "sublayers" and then call "countByEnumeratingWithState" on the result to get a count. If the count != 0 we seem to enumerate the result during which we run into the mutation problem.
Assignee | ||
Comment 5•4 years ago
|
||
The other crash in [NSView buildLayerTreeWithOwnLayerRequirement:someAncestorWantsLayer:] is from calling "count" on the return value of "buildLayerTreeWithOwnLayerRequirement:someAncestorWantsLayer:"
Assignee | ||
Comment 6•4 years ago
|
||
I'll try setting my own enumeration mutation handler and see if I can make the crash show up there instead.
Comment 7•4 years ago
|
||
Great find! __NSFastEnumerationMutationHandle should be throwing an exception, not crashing...
But onto the actual problem! We definitely mutate the sublayers array on the compositor thread. The crashing code runs on the main thread. And I don't see anything that synchronizes between them. This is bound to cause problems.
There exists +[CATransaction lock/unlock] but it usually only gets called inside the CALayer property getters / setters. Once the property (e.g. sublayers) has been gotten, it's outside the lock.
So I'm not sure how this is supposed to work, at all.
Comment 8•4 years ago
|
||
Can you check whether the patch in bug 1644940 helps?
Assignee | ||
Comment 9•4 years ago
|
||
Assignee | ||
Comment 10•4 years ago
|
||
The patch from bug 1644940 seems to fix the crash.
Comment 11•4 years ago
|
||
🎉🎉🎉
Comment 12•4 years ago
|
||
It would still be nice to find out why exceptions aren't happening. Maybe due to the same phenomenon as bug 1392431?
Assignee | ||
Comment 13•4 years ago
|
||
Confirmed that it was mutation problems: https://firefoxci.taskcluster-artifacts.net/WELrvI2UQZab9GiX8Lfnrw/0/public/logs/live_backing.log
Still don't understand the callstack though.
Assignee | ||
Comment 14•4 years ago
|
||
This is figured out enough.
Description
•