[clang 14] application crashed [@ Pickle::ReadBool(PickleIterator*, bool*) const]
Categories
(Core :: IPC, defect)
Tracking
()
People
(Reporter: glandium, Assigned: glandium)
References
(Regression)
Details
(Keywords: regression)
Attachments
(1 file)
48 bytes,
text/x-phabricator-request
|
RyanVM
:
approval-mozilla-beta+
RyanVM
:
approval-mozilla-esr91+
|
Details | Review |
PGO builds with clang trunk crash on browsertime tests with:
PROCESS-CRASH | pid: None | application crashed [@ Pickle::ReadBool(PickleIterator*, bool*) const]
Crash dump filename: /tmp/tmpj0mg9k_k/2ae09a2f-a162-feb3-1ddb-f452c1f56d57.dmp
Operating system: Android
0.0.0 Linux 3.18.31-perf-g708ac5e #1 SMP PREEMPT Fri Jan 13 08:46:35 CST 2017 armv7l
CPU: arm
ARMv7 ARM part(0x4100d030) features: half,thumb,fastmult,vfpv2,edsp,neon,vfpv3,tls,vfpv4,idiva,idivt
8 CPUs
Crash reason: SIGBUS / BUS_ADRALN
Crash address: 0x8dcf9003
Process uptime: not available
Thread 58 (crashed)
0 libxul.so!Pickle::ReadBool(PickleIterator*, bool*) const [pickle.cc:e76c26505b1f6d0887df943a05cd79fdd24fcfd6 : 179 + 0x32]
r0 = 0x8dcf900b r1 = 0x8dcf900b r2 = 0x8dcf9003 r3 = 0x00000004
r4 = 0x0000025c r5 = 0x7f5bd120 r6 = 0x0000000c r7 = 0x7f5bd058
r8 = 0x7f5bd063 r9 = 0x7a2ec93c r10 = 0x7a2ec934 r11 = 0x8dcb3580
r12 = 0x8dcf900b r13 = 0x7f5bd038 r14 = 0x89360ca9 r15 = 0x892e6326
pc = 0x892e6326 lr = 0x89360ca9 fp = 0x8dcb3580 sp = 0x7f5bd038
Found by: given as instruction pointer in context
1 libxul.so!bool mozilla::ipc::ReadIPDLParam<mozilla::Maybe<mozilla::ipc::ByteBuf> >(IPC::Message const*, PickleIterator*, mozilla::ipc::IProtocol*, mozilla::Maybe<mozilla::ipc::ByteBuf>*) [IPDLParamTraits.h:e76c26505b1f6d0887df943a05cd79fdd24fcfd6 : 72 + 0x5]
r4 = 0x7f5bd180 r5 = 0x7a2ec934 r6 = 0x7a2ec934 r7 = 0x7f5bd078
r8 = 0x7f5bd120 r9 = 0x00000000 r10 = 0x7a2ec934 r11 = 0x8dcb3580
pc = 0x89360ca9 sp = 0x7f5bd060
Found by: call frame info
2 libxul.so!mozilla::ipc::IPDLParamTraits<mozilla::layers::DisplayListData>::Read(IPC::Message const*, PickleIterator*, mozilla::ipc::IProtocol*, mozilla::layers::DisplayListData*) [RenderRootTypes.cpp:e76c26505b1f6d0887df943a05cd79fdd24fcfd6 : 38 + 0xb]
r4 = 0x7f5bd120 r5 = 0x7f5bd148 r6 = 0x7a2ec934 r7 = 0x7f5bd090
r8 = 0x8dcb3580 r9 = 0x00000000 r10 = 0x7a2ec934 r11 = 0x8dcb3580
pc = 0x898215f1 sp = 0x7f5bd080
Found by: call frame info
3 libxul.so!mozilla::layers::PWebRenderBridgeParent::OnMessageReceived(IPC::Message const&) [PWebRenderBridgeParent.cpp: : 324 + 0x5]
r4 = 0x8bcf0428 r5 = 0x7f5bd148 r6 = 0x8bcd9204 r7 = 0x7f5bd250
r8 = 0x7a2ec93c r9 = 0x00000000 r10 = 0x7a2ec934 r11 = 0x8dcb3580
pc = 0x897534f7 sp = 0x7f5bd098
Found by: call frame info
4 libxul.so!mozilla::layers::PCompositorManagerParent::OnMessageReceived(IPC::Message const&) [PCompositorManagerParent.cpp: : 204 + 0xb]
r4 = 0x8270e040 r5 = 0x00000000 r6 = 0x7a2ec934 r7 = 0x7f5bd2d8
r8 = 0x7a2ec900 r9 = 0x00000008 r10 = 0x8270e0a4 r11 = 0x8278fe50
pc = 0x8971e937 sp = 0x7f5bd258
Found by: call frame info
5 libxul.so!mozilla::ipc::MessageChannel::MessageTask::Run() [MessageChannel.cpp:e76c26505b1f6d0887df943a05cd79fdd24fcfd6 : 1851 + 0x37b]
r4 = 0x00000001 r5 = 0x00000000 r6 = 0x00000000 r7 = 0x7f5bd5a8
r8 = 0x7a2ec900 r9 = 0x00000008 r10 = 0x8270e0a4 r11 = 0x8278fe50
pc = 0x892f2cdd sp = 0x7f5bd2e0
Found by: call frame info
6 libxul.so!nsThread::ProcessNextEvent(bool, bool*) [nsThread.cpp:e76c26505b1f6d0887df943a05cd79fdd24fcfd6 : 1189 + 0x7]
r4 = 0x8dc6d200 r5 = 0x7a2ec900 r6 = 0x7f5bd628 r7 = 0x7f5bd6b0
r8 = 0xffffffff r9 = 0x7f5bd778 r10 = 0x00000092 r11 = 0x00000000
pc = 0x895b8483 sp = 0x7f5bd5b0
This started happening with LLVM commit 5ce89279c0986d0bcbe526dce52f91dd0c16427c, which actually only unveiled preexisting code smell in pickle.cc.
The problem is that Pickle::ReadBool
relies on Pickle::ReadInt
, and thus will read an entire 32-bits word from the iterator. That would be fine if we didn't end up using a Copier::Copy
variant that wants 32-bits alignment. LLVM legitimately ends up using an instruction that does require that much alignment, but at runtime, it doesn't get an aligned pointer, and we end up with a SIGBUS.
This in turn, is due to the hasSufficientAlignment
test in PickleIterator::CopyInto
being inadequate: MOZ_ALIGNOF(T) <= sizeof(Pickle::memberAlignmentType)
. In our case T is int, MOZ_ALIGNOF(T) is 4, Pickle::memberAlignmentType
is uint32_t
, its sizeof is 4. So the test is true, even though we don't have sufficient alignment.
... except this is all a red herring.
AFAICT, the Pickle write code aligns everything to 32-bits, so the bool should be read from a 32-bits aligned pointer in the first place. And adding a validation that the pointer is properly aligned in the Copier::Copy
that requires alignment does trigger the crash with the current version of clang we're using (13.0). Even better, it's not ARM or Android-specific. We only see it on Android because as of this LLVM change, the instruction used to read that data requires alignment, which it didn't before and doesn't on other platforms.
Because this is not ARM or Android-specific, applying the same patch to Copier::Copy
crashes a Linux x86_64 build very quickly during startup, which means I caught it in rr, and sent it to pernosco: https://pernos.co/debug/IXZ_FuvJp_yaKHB02WKSbQ/index.html
So, in fact, the problem starts well before. It starts here:
https://searchfox.org/mozilla-central/rev/3de56eb5f266f523340e739ae1b53258e0a95dfe/ipc/glue/ByteBufUtils.h#29
which calls into https://searchfox.org/mozilla-central/rev/3de56eb5f266f523340e739ae1b53258e0a95dfe/ipc/chromium/src/base/pickle.cc#608
So, we're inserting a buffer in the buffer list and that buffer has a size of 185. The EndWrite adds padding to align... except because we've actually inserted a buffer in the buffer list, we're not adding another buffer and adding enough bytes to re-align to 32-bits, which is 3... so from the buffer-list perspective, we're aligned, but from the immediate buffer perspective, we're not aligned anymore.
This probably only affects intra-process IPC. Inter-process IPC is likely to have a single buffer, or have power-of-two sized buffers in its bufferlist if it uses one.
Assignee | ||
Comment 1•1 year ago
|
||
Updated•1 year ago
|
Comment 2•1 year ago
|
||
Set release status flags based on info from the regressing bug 1379680
Pushed by mh@glandium.org: https://hg.mozilla.org/integration/autoland/rev/d09e050e686b Align the size of buffers given to WriteBytesZeroCopy. r=nika
Comment 4•1 year ago
|
||
bugherder |
Comment 5•1 year ago
|
||
The patch landed in nightly and beta is affected.
:glandium, is this bug important enough to require an uplift?
If not please set status_beta
to wontfix
.
For more information, please visit auto_nag documentation.
Assignee | ||
Comment 6•1 year ago
|
||
Comment on attachment 9259072 [details]
Bug 1750168 - Align the size of buffers given to WriteBytesZeroCopy.
Beta/Release Uplift Approval Request
- User impact if declined: For Mozilla.org, it doesn't makes much difference, but for downstreams building Firefox from source, it may prevent crashes if the compiler decides to use instructions that require aligned memory accesses.
- Is this code covered by automated tests?: Yes
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): The patch makes the IPC serialization expand (realloc) buffers it is given ownership of. In case our memory allocator is used, we avoid calling realloc because the way memory is allocated guarantees those reallocs to always happen in-place. IOW, the effects of the patch are well understood.
- String changes made/needed: N/A
ESR Uplift Approval Request
- If this is not a sec:{high,crit} bug, please state case for ESR consideration: See above
- User impact if declined:
- Fix Landed on Version: 98
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky):
Updated•1 year ago
|
Comment 7•1 year ago
|
||
Comment on attachment 9259072 [details]
Bug 1750168 - Align the size of buffers given to WriteBytesZeroCopy.
Approved for 97.0b6.
Comment 8•1 year ago
|
||
bugherderuplift |
Comment 9•1 year ago
|
||
Comment on attachment 9259072 [details]
Bug 1750168 - Align the size of buffers given to WriteBytesZeroCopy.
Approved for 91.6esr.
Comment 10•1 year ago
|
||
bugherderuplift |
Updated•1 year ago
|
Description
•