Closed Bug 1750168 Opened 4 months ago Closed 4 months ago

[clang 14] application crashed [@ Pickle::ReadBool(PickleIterator*, bool*) const]

Categories

(Core :: IPC, defect)

ARM
Android
defect

Tracking

()

RESOLVED FIXED
98 Branch
Tracking Status
firefox-esr91 --- fixed
firefox96 --- wontfix
firefox97 --- fixed
firefox98 --- fixed

People

(Reporter: glandium, Assigned: glandium)

References

(Regression)

Details

(Keywords: regression)

Attachments

(1 file)

PGO builds with clang trunk crash on browsertime tests with:

PROCESS-CRASH | pid: None | application crashed [@ Pickle::ReadBool(PickleIterator*, bool*) const]
Crash dump filename: /tmp/tmpj0mg9k_k/2ae09a2f-a162-feb3-1ddb-f452c1f56d57.dmp
Operating system: Android
                  0.0.0 Linux 3.18.31-perf-g708ac5e #1 SMP PREEMPT Fri Jan 13 08:46:35 CST 2017 armv7l
CPU: arm
     ARMv7 ARM part(0x4100d030) features: half,thumb,fastmult,vfpv2,edsp,neon,vfpv3,tls,vfpv4,idiva,idivt
     8 CPUs
Crash reason:  SIGBUS / BUS_ADRALN
Crash address: 0x8dcf9003
Process uptime: not available
Thread 58  (crashed)
 0  libxul.so!Pickle::ReadBool(PickleIterator*, bool*) const [pickle.cc:e76c26505b1f6d0887df943a05cd79fdd24fcfd6 : 179 + 0x32]
     r0 = 0x8dcf900b    r1 = 0x8dcf900b    r2 = 0x8dcf9003    r3 = 0x00000004
     r4 = 0x0000025c    r5 = 0x7f5bd120    r6 = 0x0000000c    r7 = 0x7f5bd058 
     r8 = 0x7f5bd063    r9 = 0x7a2ec93c   r10 = 0x7a2ec934   r11 = 0x8dcb3580
    r12 = 0x8dcf900b   r13 = 0x7f5bd038   r14 = 0x89360ca9   r15 = 0x892e6326
     pc = 0x892e6326    lr = 0x89360ca9    fp = 0x8dcb3580    sp = 0x7f5bd038
    Found by: given as instruction pointer in context
 1  libxul.so!bool mozilla::ipc::ReadIPDLParam<mozilla::Maybe<mozilla::ipc::ByteBuf> >(IPC::Message const*, PickleIterator*, mozilla::ipc::IProtocol*, mozilla::Maybe<mozilla::ipc::ByteBuf>*) [IPDLParamTraits.h:e76c26505b1f6d0887df943a05cd79fdd24fcfd6 : 72 + 0x5] 
     r4 = 0x7f5bd180    r5 = 0x7a2ec934    r6 = 0x7a2ec934    r7 = 0x7f5bd078
     r8 = 0x7f5bd120    r9 = 0x00000000   r10 = 0x7a2ec934   r11 = 0x8dcb3580
     pc = 0x89360ca9    sp = 0x7f5bd060
    Found by: call frame info
 2  libxul.so!mozilla::ipc::IPDLParamTraits<mozilla::layers::DisplayListData>::Read(IPC::Message const*, PickleIterator*, mozilla::ipc::IProtocol*, mozilla::layers::DisplayListData*) [RenderRootTypes.cpp:e76c26505b1f6d0887df943a05cd79fdd24fcfd6 : 38 + 0xb]
     r4 = 0x7f5bd120    r5 = 0x7f5bd148    r6 = 0x7a2ec934    r7 = 0x7f5bd090
     r8 = 0x8dcb3580    r9 = 0x00000000   r10 = 0x7a2ec934   r11 = 0x8dcb3580
     pc = 0x898215f1    sp = 0x7f5bd080
    Found by: call frame info
 3  libxul.so!mozilla::layers::PWebRenderBridgeParent::OnMessageReceived(IPC::Message const&) [PWebRenderBridgeParent.cpp: : 324 + 0x5]
     r4 = 0x8bcf0428    r5 = 0x7f5bd148    r6 = 0x8bcd9204    r7 = 0x7f5bd250
     r8 = 0x7a2ec93c    r9 = 0x00000000   r10 = 0x7a2ec934   r11 = 0x8dcb3580
     pc = 0x897534f7    sp = 0x7f5bd098
    Found by: call frame info
 4  libxul.so!mozilla::layers::PCompositorManagerParent::OnMessageReceived(IPC::Message const&) [PCompositorManagerParent.cpp: : 204 + 0xb]
     r4 = 0x8270e040    r5 = 0x00000000    r6 = 0x7a2ec934    r7 = 0x7f5bd2d8
     r8 = 0x7a2ec900    r9 = 0x00000008   r10 = 0x8270e0a4   r11 = 0x8278fe50
     pc = 0x8971e937    sp = 0x7f5bd258
    Found by: call frame info
 5  libxul.so!mozilla::ipc::MessageChannel::MessageTask::Run() [MessageChannel.cpp:e76c26505b1f6d0887df943a05cd79fdd24fcfd6 : 1851 + 0x37b]
     r4 = 0x00000001    r5 = 0x00000000    r6 = 0x00000000    r7 = 0x7f5bd5a8
     r8 = 0x7a2ec900    r9 = 0x00000008   r10 = 0x8270e0a4   r11 = 0x8278fe50 
     pc = 0x892f2cdd    sp = 0x7f5bd2e0
    Found by: call frame info
 6  libxul.so!nsThread::ProcessNextEvent(bool, bool*) [nsThread.cpp:e76c26505b1f6d0887df943a05cd79fdd24fcfd6 : 1189 + 0x7]
     r4 = 0x8dc6d200    r5 = 0x7a2ec900    r6 = 0x7f5bd628    r7 = 0x7f5bd6b0
     r8 = 0xffffffff    r9 = 0x7f5bd778   r10 = 0x00000092   r11 = 0x00000000
     pc = 0x895b8483    sp = 0x7f5bd5b0

This started happening with LLVM commit 5ce89279c0986d0bcbe526dce52f91dd0c16427c, which actually only unveiled preexisting code smell in pickle.cc.

The problem is that Pickle::ReadBool relies on Pickle::ReadInt, and thus will read an entire 32-bits word from the iterator. That would be fine if we didn't end up using a Copier::Copy variant that wants 32-bits alignment. LLVM legitimately ends up using an instruction that does require that much alignment, but at runtime, it doesn't get an aligned pointer, and we end up with a SIGBUS.

This in turn, is due to the hasSufficientAlignment test in PickleIterator::CopyInto being inadequate: MOZ_ALIGNOF(T) <= sizeof(Pickle::memberAlignmentType). In our case T is int, MOZ_ALIGNOF(T) is 4, Pickle::memberAlignmentType is uint32_t, its sizeof is 4. So the test is true, even though we don't have sufficient alignment.

... except this is all a red herring.
AFAICT, the Pickle write code aligns everything to 32-bits, so the bool should be read from a 32-bits aligned pointer in the first place. And adding a validation that the pointer is properly aligned in the Copier::Copy that requires alignment does trigger the crash with the current version of clang we're using (13.0). Even better, it's not ARM or Android-specific. We only see it on Android because as of this LLVM change, the instruction used to read that data requires alignment, which it didn't before and doesn't on other platforms.

Because this is not ARM or Android-specific, applying the same patch to Copier::Copy crashes a Linux x86_64 build very quickly during startup, which means I caught it in rr, and sent it to pernosco: https://pernos.co/debug/IXZ_FuvJp_yaKHB02WKSbQ/index.html

So, in fact, the problem starts well before. It starts here:
https://searchfox.org/mozilla-central/rev/3de56eb5f266f523340e739ae1b53258e0a95dfe/ipc/glue/ByteBufUtils.h#29
which calls into https://searchfox.org/mozilla-central/rev/3de56eb5f266f523340e739ae1b53258e0a95dfe/ipc/chromium/src/base/pickle.cc#608

So, we're inserting a buffer in the buffer list and that buffer has a size of 185. The EndWrite adds padding to align... except because we've actually inserted a buffer in the buffer list, we're not adding another buffer and adding enough bytes to re-align to 32-bits, which is 3... so from the buffer-list perspective, we're aligned, but from the immediate buffer perspective, we're not aligned anymore.

This probably only affects intra-process IPC. Inter-process IPC is likely to have a single buffer, or have power-of-two sized buffers in its bufferlist if it uses one.

Set release status flags based on info from the regressing bug 1379680

Pushed by mh@glandium.org:
https://hg.mozilla.org/integration/autoland/rev/d09e050e686b
Align the size of buffers given to WriteBytesZeroCopy. r=nika
Status: NEW → RESOLVED
Closed: 4 months ago
Resolution: --- → FIXED
Target Milestone: --- → 98 Branch

The patch landed in nightly and beta is affected.
:glandium, is this bug important enough to require an uplift?
If not please set status_beta to wontfix.

For more information, please visit auto_nag documentation.

Flags: needinfo?(mh+mozilla)

Comment on attachment 9259072 [details]
Bug 1750168 - Align the size of buffers given to WriteBytesZeroCopy.

Beta/Release Uplift Approval Request

  • User impact if declined: For Mozilla.org, it doesn't makes much difference, but for downstreams building Firefox from source, it may prevent crashes if the compiler decides to use instructions that require aligned memory accesses.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): The patch makes the IPC serialization expand (realloc) buffers it is given ownership of. In case our memory allocator is used, we avoid calling realloc because the way memory is allocated guarantees those reallocs to always happen in-place. IOW, the effects of the patch are well understood.
  • String changes made/needed: N/A

ESR Uplift Approval Request

  • If this is not a sec:{high,crit} bug, please state case for ESR consideration: See above
  • User impact if declined:
  • Fix Landed on Version: 98
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky):
Flags: needinfo?(mh+mozilla)
Attachment #9259072 - Flags: approval-mozilla-esr91?
Attachment #9259072 - Flags: approval-mozilla-beta?

Comment on attachment 9259072 [details]
Bug 1750168 - Align the size of buffers given to WriteBytesZeroCopy.

Approved for 97.0b6.

Attachment #9259072 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Comment on attachment 9259072 [details]
Bug 1750168 - Align the size of buffers given to WriteBytesZeroCopy.

Approved for 91.6esr.

Attachment #9259072 - Flags: approval-mozilla-esr91? → approval-mozilla-esr91+
Has Regression Range: --- → yes
Blocks: 1758780
You need to log in before you can comment on or make changes to this bug.