Closed Bug 1995582 Opened 5 months ago Closed 27 days ago

Crashes [@ nsPNGEncoder::ConvertHostARGBRow ] on macOS, LLVM 21, on a third-party release (a Nix Package)

Categories

(Core :: Graphics: ImageLib, defect)

Unspecified
macOS
defect

Tracking

()

RESOLVED FIXED

People

(Reporter: smichaud, Unassigned)

References

(Blocks 1 open bug)

Details

Crash Data

These have been around for a while in small numbers, but they've increased rather dramatically on FF 144. Since they're macOS only, they're probably not a Mozilla bug. But some change on the 144 branch may have "encouraged" them. They're rare enough that we don't have any crashes in nightlies, or even betas. So it will be hard to tell exactly what that "encouragement" was.

This crash signature was also reported in bug 614144. But that bug is very old, and seems to have been fixed long ago.

I just noticed that almost all of these crashes are on the "default" release channel. On the trunk it'd mean local builds. I'm not sure what it means here. Maybe these crashes are a fluke.

bp-65141841-478f-47a9-9ba8-513240251021

Typical crash stack:

Crashing Thread (0), Name: MainThread
Frame  Module  Signature  Source  Trust
0  XUL  nsPNGEncoder::ConvertHostARGBRow(unsigned char const*, unsigned char*, unsigned int, bool)  /private/tmp/nix-build-firefox-unwrapped-144.0.drv-0/firefox-144.0/image/encoders/png/nsPNGEncoder.cpp:689  inlined
0  XUL  nsPNGEncoder::AddImageFrame(unsigned char const*, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, nsTSubstring<char16_t> const&)  /private/tmp/nix-build-firefox-unwrapped-144.0.drv-0/firefox-144.0/image/encoders/png/nsPNGEncoder.cpp:287  context
1  XUL  nsPNGEncoder::InitFromData(unsigned char const*, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, nsTSubstring<char16_t> const&)  /private/tmp/nix-build-firefox-unwrapped-144.0.drv-0/firefox-144.0/image/encoders/png/nsPNGEncoder.cpp:72  cfi
2  XUL  mozilla::image::EncodeImageData(mozilla::gfx::DataSourceSurface*, mozilla::gfx::DataSourceSurface&::ScopedMap, nsTSubstring<char> const&, nsTSubstring<char16_t> const&, nsIInputStream**)  /private/tmp/nix-build-firefox-unwrapped-144.0.drv-0/firefox-144.0/image/imgTools.cpp:442  cfi
3  XUL  mozilla::image::EncodeImageData(mozilla::gfx::DataSourceSurface*, nsTSubstring<char> const&, nsTSubstring<char16_t> const&, nsIInputStream**)  /private/tmp/nix-build-firefox-unwrapped-144.0.drv-0/firefox-144.0/image/imgTools.cpp:460  inlined
3  XUL  mozilla::image::imgTools::EncodeScaledImage(imgIContainer*, nsTSubstring<char> const&, int, int, nsTSubstring<char16_t> const&, nsIInputStream**)  /private/tmp/nix-build-firefox-unwrapped-144.0.drv-0/firefox-144.0/image/imgTools.cpp:527  cfi
4  XUL  nsFaviconService::OptimizeIconSizes(mozilla::places::IconData&)  /private/tmp/nix-build-firefox-unwrapped-144.0.drv-0/firefox-144.0/toolkit/components/places/nsFaviconService.cpp:711  cfi
5  XUL  nsFaviconService::SetFaviconForPage(nsIURI*, nsIURI*, nsIURI*, long long, bool, JSContext*, mozilla::dom::Promise**)  /private/tmp/nix-build-firefox-unwrapped-144.0.drv-0/firefox-144.0/toolkit/components/places/nsFaviconService.cpp:373  cfi
6  XUL  _NS_InvokeByIndex   cfi
7  XUL  CallMethodHelper::Invoke()  /private/tmp/nix-build-firefox-unwrapped-144.0.drv-0/firefox-144.0/js/xpconnect/src/XPCWrappedNative.cpp:1620  inlined
7  XUL  CallMethodHelper::Call()  /private/tmp/nix-build-firefox-unwrapped-144.0.drv-0/firefox-144.0/js/xpconnect/src/XPCWrappedNative.cpp:1174  inlined
7  XUL  XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode)  /private/tmp/nix-build-firefox-unwrapped-144.0.drv-0/firefox-144.0/js/xpconnect/src/XPCWrappedNative.cpp:1120  cfi
8  XUL  XPC_WN_CallMethod(JSContext*, unsigned int, JS::Value*)  /private/tmp/nix-build-firefox-unwrapped-144.0.drv-0/firefox-144.0/js/xpconnect/src/XPCWrappedNativeJSOps.cpp:966  cfi
9  XUL  CallJSNative(JSContext*, bool (*)(JSContext*, unsigned int, JS::Value*), js::CallReason, JS::CallArgs const&)  /private/tmp/nix-build-firefox-unwrapped-144.0.drv-0/firefox-144.0/js/src/vm/Interpreter.cpp:501  inlined
9  XUL  js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason)  /private/tmp/nix-build-firefox-unwrapped-144.0.drv-0/firefox-144.0/js/src/vm/Interpreter.cpp:597  cfi
10  XUL  InternalCall(JSContext*, js::AnyInvokeArgs const&, js::CallReason)  /private/tmp/nix-build-firefox-unwrapped-144.0.drv-0/firefox-144.0/js/src/vm/Interpreter.cpp:664  inlined
10  XUL  js::CallFromStack(JSContext*, JS::CallArgs const&, js::CallReason)  /private/tmp/nix-build-firefox-unwrapped-144.0.drv-0/firefox-144.0/js/src/vm/Interpreter.cpp:669  inlined
10  XUL  js::Interpret(JSContext*, js::RunState&)  /private/tmp/nix-build-firefox-unwrapped-144.0.drv-0/firefox-144.0/js/src/vm/Interpreter.cpp:3287  cfi
11  XUL  MaybeEnterInterpreterTrampoline(JSContext*, js::RunState&)  /private/tmp/nix-build-firefox-unwrapped-144.0.drv-0/firefox-144.0/js/src/vm/Interpreter.cpp:395  inlined
11  XUL  js::RunScript(JSContext*, js::RunState&)  /private/tmp/nix-build-firefox-unwrapped-144.0.drv-0/firefox-144.0/js/src/vm/Interpreter.cpp:471  cfi
12  XUL  js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason)  /private/tmp/nix-build-firefox-unwrapped-144.0.drv-0/firefox-144.0/js/src/vm/Interpreter.cpp:629  cfi
13  XUL  InternalCall(JSContext*, js::AnyInvokeArgs const&, js::CallReason)  /private/tmp/nix-build-firefox-unwrapped-144.0.drv-0/firefox-144.0/js/src/vm/Interpreter.cpp:664  inlined
13  XUL  js::Call(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, js::AnyInvokeArgs const&, JS::MutableHandle<JS::Value>, js::CallReason)  /private/tmp/nix-build-firefox-unwrapped-144.0.drv-0/firefox-144.0/js/src/vm/Interpreter.cpp:696  cfi
14  XUL  JS::Call(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, JS::HandleValueArray const&, JS::MutableHandle<JS::Value>)  /private/tmp/nix-build-firefox-unwrapped-144.0.drv-0/firefox-144.0/js/src/vm/CallAndConstruct.cpp:119  cfi
15  XUL  mozilla::dom::MessageListener::ReceiveMessage(mozilla::dom::BindingCallContext&, JS::Handle<JS::Value>, mozilla::dom::ReceiveMessageArgument const&, JS::MutableHandle<JS::Value>, mozilla::ErrorResult&)  s3:gecko-generated-sources-l1:d01fd731b0a2dd12291b621774d88619c1a60454bd2760958e3a1055b2740665edb6c236dbdb2666091892c4f2013157a68f155e2aa02d601f037ad66b6bad8f/dom/bindings/MessageManagerBinding.cpp::5756  cfi
16  XUL  mozilla::dom::MessageListener::ReceiveMessage(mozilla::dom::ReceiveMessageArgument const&, JS::MutableHandle<JS::Value>, mozilla::ErrorResult&, char const*, mozilla::dom::CallbackObjectBase::ExceptionHandling, JS::Realm*)  s3:gecko-generated-sources-l1:9b2aa2670f267a2aefbcacbab9744eb3c967b72481e28f8725f4ed272f5c42f1d186cbd276a0dd4a5ddf6a57cf2273b50a8620351bb9e78546b89866bf8ede5b/dist/include/mozilla/dom/MessageManagerBinding.h::579  inlined
16  XUL  mozilla::dom::JSActor::CallReceiveMessage(JSContext*, mozilla::dom::JSActorMessageMeta const&, JS::Handle<JS::Value>, JS::MutableHandle<JS::Value>, mozilla::ErrorResult&)  /private/tmp/nix-build-firefox-unwrapped-144.0.drv-0/firefox-144.0/dom/ipc/jsactor/JSActor.cpp:289  cfi
17  XUL  mozilla::dom::JSActor::ReceiveMessage(JSContext*, mozilla::dom::JSActorMessageMeta const&, JS::Handle<JS::Value>, mozilla::ErrorResult&)  /private/tmp/nix-build-firefox-unwrapped-144.0.drv-0/firefox-144.0/dom/ipc/jsactor/JSActor.cpp:305  cfi
18  XUL  mozilla::dom::JSActorManager::ReceiveRawMessage(mozilla::dom::JSActorMessageMeta const&, std::__1::unique_ptr<mozilla::dom::ipc::StructuredCloneData, std::__1::default_delete<mozilla::dom::ipc::StructuredCloneData> >, std::__1::unique_ptr<mozilla::dom::ipc::StructuredCloneData, std::__1::default_delete<mozilla::dom::ipc::StructuredCloneData> >)  /private/tmp/nix-build-firefox-unwrapped-144.0.drv-0/firefox-144.0/dom/ipc/jsactor/JSActorManager.cpp:226  cfi
19  XUL  mozilla::dom::WindowGlobalParent::RecvRawMessage(mozilla::dom::JSActorMessageMeta const&, std::__1::unique_ptr<mozilla::dom::ClonedMessageData, std::__1::default_delete<mozilla::dom::ClonedMessageData> > const&, std::__1::unique_ptr<mozilla::dom::ClonedMessageData, std::__1::default_delete<mozilla::dom::ClonedMessageData> > const&)  /private/tmp/nix-build-firefox-unwrapped-144.0.drv-0/firefox-144.0/dom/ipc/WindowGlobalParent.cpp:569  cfi
20  XUL  mozilla::dom::PWindowGlobalParent::OnMessageReceived(IPC::Message const&)  s3:gecko-generated-sources-l1:b9d0391237c1672b1bd690918a26ecd26e7b82540a738291ae5a277bef1c3a57a27b451b8e6af46f80428546cccae96c95f35728b7784600cfc385bd5e3d2a79/ipc/ipdl/PWindowGlobalParent.cpp::903  cfi
21  XUL  mozilla::dom::PContentParent::OnMessageReceived(IPC::Message const&)  s3:gecko-generated-sources-l1:a518c1c70c82c36194e2189e4f3f6fb94ebbdc9d2070858295f5b9877e2741175e32b75e881c198574d46d9b926a385bf539b8b099d6da3c6e07739dc33f87ee/ipc/ipdl/PContentParent.cpp::6412  cfi
22  XUL  mozilla::ipc::MessageChannel::DispatchAsyncMessage(mozilla::ipc::ActorLifecycleProxy*, IPC::Message const&)  /private/tmp/nix-build-firefox-unwrapped-144.0.drv-0/firefox-144.0/ipc/glue/MessageChannel.cpp:1797  cfi
23  XUL  mozilla::ipc::MessageChannel::DispatchMessage(mozilla::ipc::ActorLifecycleProxy*, std::__1::unique_ptr<IPC::Message, std::__1::default_delete<IPC::Message> >)  /private/tmp/nix-build-firefox-unwrapped-144.0.drv-0/firefox-144.0/ipc/glue/MessageChannel.cpp:1723  cfi
24  XUL  mozilla::ipc::MessageChannel::RunMessage(mozilla::ipc::ActorLifecycleProxy*, mozilla::ipc::MessageChannel::MessageTask&)  /private/tmp/nix-build-firefox-unwrapped-144.0.drv-0/firefox-144.0/ipc/glue/MessageChannel.cpp:1512  cfi
25  XUL  mozilla::ipc::MessageChannel::MessageTask::Run()  /private/tmp/nix-build-firefox-unwrapped-144.0.drv-0/firefox-144.0/ipc/glue/MessageChannel.cpp:1614  cfi
26  XUL  mozilla::RunnableTask::Run()  /private/tmp/nix-build-firefox-unwrapped-144.0.drv-0/firefox-144.0/xpcom/threads/TaskController.cpp:703  cfi
27  XUL  mozilla::TaskController::RunTask(mozilla::Task*)  /private/tmp/nix-build-firefox-unwrapped-144.0.drv-0/firefox-144.0/xpcom/threads/TaskController.cpp:228  inlined
27  XUL  mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&)  /private/tmp/nix-build-firefox-unwrapped-144.0.drv-0/firefox-144.0/xpcom/threads/TaskController.cpp:1323  cfi
28  XUL  mozilla::TaskController::ExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&)  /private/tmp/nix-build-firefox-unwrapped-144.0.drv-0/firefox-144.0/xpcom/threads/TaskController.cpp:1146  cfi
29  XUL  mozilla::TaskController::ProcessPendingMTTask(bool)  /private/tmp/nix-build-firefox-unwrapped-144.0.drv-0/firefox-144.0/xpcom/threads/TaskController.cpp:639  inlined
29  XUL  mozilla::TaskController::TaskController()::$_0::operator()() const  /private/tmp/nix-build-firefox-unwrapped-144.0.drv-0/firefox-144.0/xpcom/threads/TaskController.cpp:333  inlined
29  XUL  mozilla::detail::RunnableFunction<mozilla::TaskController::TaskController()::$_0>::Run()  /private/tmp/nix-build-firefox-unwrapped-144.0.drv-0/firefox-144.0/xpcom/threads/nsThreadUtils.h:549  cfi
30  XUL  nsThread::ProcessNextEvent(bool, bool*)  /private/tmp/nix-build-firefox-unwrapped-144.0.drv-0/firefox-144.0/xpcom/threads/nsThread.cpp:1157  cfi
31  XUL  NS_ProcessPendingEvents(nsIThread*, unsigned int)  /private/tmp/nix-build-firefox-unwrapped-144.0.drv-0/firefox-144.0/xpcom/threads/nsThreadUtils.cpp:427  cfi
32  XUL  nsBaseAppShell::NativeEventCallback()  /private/tmp/nix-build-firefox-unwrapped-144.0.drv-0/firefox-144.0/widget/nsBaseAppShell.cpp:87  cfi
33  XUL  nsAppShell::ProcessGeckoEvents(void*)  /private/tmp/nix-build-firefox-unwrapped-144.0.drv-0/firefox-144.0/widget/cocoa/nsAppShell.mm:534  cfi
34  CoreFoundation  __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__   cfi
35  CoreFoundation  __CFRunLoopDoSource0   cfi
36  CoreFoundation  __CFRunLoopDoSources0   cfi
37  CoreFoundation  __CFRunLoopRun   cfi
38  CoreFoundation  CFRunLoopRunSpecific   cfi
39  HIToolbox  RunCurrentEventLoopInMode   cfi
40  HIToolbox  ReceiveNextEventCommon   cfi
41  HIToolbox  _BlockUntilNextEventMatchingListInModeWithFilter   cfi
42  AppKit  _DPSNextEvent   cfi
43  AppKit  -[NSApplication(NSEventRouting) _nextEventMatchingEventMask:untilDate:inMode:dequeue:]   cfi
44  XUL  -[GeckoNSApplication nextEventMatchingMask:untilDate:inMode:dequeue:]  /private/tmp/nix-build-firefox-unwrapped-144.0.drv-0/firefox-144.0/widget/cocoa/nsAppShell.mm:189  cfi
45  AppKit  -[NSApplication run]   cfi
46  XUL  -[GeckoNSApplication run]  /private/tmp/nix-build-firefox-unwrapped-144.0.drv-0/firefox-144.0/widget/cocoa/nsAppShell.mm:173  cfi
47  XUL  nsAppShell::Run()  /private/tmp/nix-build-firefox-unwrapped-144.0.drv-0/firefox-144.0/widget/cocoa/nsAppShell.mm:864  cfi
48  XUL  nsAppStartup::Run()  /private/tmp/nix-build-firefox-unwrapped-144.0.drv-0/firefox-144.0/toolkit/components/startup/nsAppStartup.cpp:291  cfi
49  XUL  XREMain::XRE_mainRun()  /private/tmp/nix-build-firefox-unwrapped-144.0.drv-0/firefox-144.0/toolkit/xre/nsAppRunner.cpp:5922  cfi
50  XUL  XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&)  /private/tmp/nix-build-firefox-unwrapped-144.0.drv-0/firefox-144.0/toolkit/xre/nsAppRunner.cpp:6167  cfi
51  XUL  XRE_main(int, char**, mozilla::BootstrapConfig const&)  /private/tmp/nix-build-firefox-unwrapped-144.0.drv-0/firefox-144.0/toolkit/xre/nsAppRunner.cpp:6240  cfi

How to search for these crashes:

https://crash-stats.mozilla.org/search/?proto_signature=~ConvertHostARGBRow&date=%3E%3D2025-09-21T15%3A28%3A00.000Z&date=%3C2025-10-21T15%3A28%3A00.000Z&_facets=signature&_facets=proto_signature&_facets=platform_version&_facets=cpu_arch&_facets=version&_facets=release_channel&_sort=-date&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform_version#facet-release_channel

Hardware: ARM64 → Unspecified

I searched on "nix-build-firefox-unwrapped" and found a hit that seems to indicate these crashes happen in a build extracted from a "Nix package".

There's a "Firefox Nix Package" at https://mynixos.com/nixpkgs/package/firefox. But it's version 142.0.1, and in any case I can't figure out how to "install" it (even after signing in).

I just checked the only third-party Firefox build I have access to (running on Ubuntu Linux), and it's on the "release" channel. So a third-party Firefox release running on the "default" channel is (probably) unorthodox, but maybe not unheard of.

Summary: Crashes [@ nsPNGEncoder::ConvertHostARGBRow ] on macOS, mostly on FF 144 → Crashes [@ nsPNGEncoder::ConvertHostARGBRow ] on macOS, mostly on FF 144, possibly on a third-party release
Component: Graphics → Graphics: ImageLib

It seems more like a time based thing then a version based thing because there are a decent number of crashes on 143, and the crashes started right at the start of 144 when 143 was still being used significantly. Also I checked ImageEncoder and nsPNGEncoder, no changes for 144.

Almost of all of these are crashing the parent process.

And it looks like if a machine crashes once with this signature it is very likely to crash again with the same signature.

From looking at a random selection of crashes it seems like it is more common to have a stack like https://crash-stats.mozilla.org/report/index/c3684c19-2a51-4a4e-bac2-b7f9e0251017 where it goes through nsFaviconService::SetFaviconForPage and nsFaviconService::OptimizeIconSizes vs something like comment 0 where it starts at EncodingRunnable::Run.

nsFaviconService.h/cpp also wasn't touched for 144.

(In reply to Timothy Nikkel (:tnikkel) from comment #4)

From looking at a random selection of crashes it seems like it is more common to have a stack like https://crash-stats.mozilla.org/report/index/c3684c19-2a51-4a4e-bac2-b7f9e0251017 where it goes through nsFaviconService::SetFaviconForPage and nsFaviconService::OptimizeIconSizes vs something like comment 0 where it starts at EncodingRunnable::Run.

nsFaviconService.h/cpp also wasn't touched for 144.

You're right. And, oddly, I already knew this, from looking at the Proto Signature Facet from my search in comment #0. Brain fart, I guess. I've edited comment #0 to fix the problem.

Before we can make progress here, we need to get hold of whatever third-party Firefox distro(s) is/are experiencing the crashes. I'll keep trying, but it'd be good to hear from someone who knows more about third-party distros than I do, particularly "Nix Packages".

Maybe we can try to contact the Nix people since it seems specific to them?

(In reply to Timothy Nikkel (:tnikkel) from comment #8)

Maybe we can try to contact the Nix people since it seems specific to them?

Sounds good to me. I think it should probably be someone from Mozilla who does it. I think the "Nix people" are at https://nixos.org/.

I've set up a macOS 15.7.1 VM on which to play around with "Nix packages". I'll report here if I find anything interesting.

I filed an issue on their github, that seemed to be the best way to try to establish contact. https://github.com/NixOS/nixpkgs/issues/454734

The Nix package manager is a tough nut to crack. But as best I can tell, all its official "packages" are source distros, and need to be built each time they're installed. I haven't yet managed to do that: There's a bug that hasn't yet been fixed in a Nix release.

So the builds whose crashes we're tracking here are presumably unofficial in some way.

(In reply to Steven Michaud [:smichaud] (Retired) from comment #11)

So the builds whose crashes we're tracking here are presumably unofficial in some way.

But all the recent "Nix package" crashes on FF 144 have the same build id (20251009125714) as Mozilla's own FF 144 builds.

(In reply to Steven Michaud [:smichaud] (Retired) from comment #12)

(In reply to Steven Michaud [:smichaud] (Retired) from comment #11)

So the builds whose crashes we're tracking here are presumably unofficial in some way.

But all the recent "Nix package" crashes on FF 144 have the same build id (20251009125714) as Mozilla's own FF 144 builds.

That might just be because anything that builds 144 gets that build id?

I can reproduce these crashes! Using an ARM64 Nix Package build of Firefox 144 made on macOS 26.0.1.

bp-143fd849-2927-422c-bafb-cb5720251024
bp-5231f0a3-1d8c-4a04-8c7e-c1fc70251024

This bug's crashes are definitely caused by a bug in LLVM 21. I'm able to trigger them in a local mozilla-central build made using clang and lld from LLVM 21. This happens with LLVM 21.1.2 as used by Nix, and also with the latest release (LLVM 21.1.4).

They don't happen with LLVM 20. Which should be good news to those working on bug 1923255.

Source and binary distros for LLVM are available here. But there are no macOS binaries for LLVM 21 -- only for LLVM 20. So I had to build them myself, which was a royal pain.

llvm.org's build instructions are incomplete and misleading, so I think it's worthwhile documenting how I built LLVM 21. The basic commands I used were as follows. I first installed Homebrew's cmake and ninja.

cmake -S llvm -B build -G Ninja -DLLVM_ENABLE_PROJECTS="clang;clang-tools-extra;lld" -DLLVM_ENABLE_RUNTIMES="libcxx;libcxxabi;libunwind;compiler-rt" -DCMAKE_BUILD_TYPE=Release -DLIBCXX_ENABLE_VENDOR_AVAILABILITY_ANNOTATIONS=ON
ninja -C build all
ninja -C build install

LLVM recommends building in two stages -- first using the Apple compiler and linker (clang and ld64); second using the LLVM compiler and linker (clang and lld) that you just installed. I did this. For the first stage I used an additional parameter (for example -DCMAKE_INSTALL_PREFIX=/staging/directory) to install into a staging directory. For the second, I temporarily put the staging directory at the start of the path: For example export PATH="/staging/directory/bin:$PATH". And I changed the install prefix to point to something more permanent -- for example -DCMAKE_INSTALL_PREFIX=/usr/local/llvm-project-21.1.4. For this you need (of course) to use sudo ninja -C build install.

To get ./mach configure and ./mach build to use the "wrong" version of LLVM, I created new shortcuts for clang and lld in ~/.mozbuild/clang/bin.

./mach configure fails using LLVM 21 and LLVM 20. I needed to specify ./mach configure --without-wasm-sandboxed-libraries. And for both I needed to patch trunk code as follows:

diff --git a/third_party/zucchini/chromium/components/zucchini/suffix_array_unittest.cc b/third_party/zucchini/chromium/components/zucchini/suffix_array_unittest.cc
--- a/third_party/zucchini/chromium/components/zucchini/suffix_array_unittest.cc
+++ b/third_party/zucchini/chromium/components/zucchini/suffix_array_unittest.cc
@@ -22,7 +22,8 @@ using SLType = InducedSuffixSort::SLType
 
 }  // namespace
 
-using ustring = std::basic_string<unsigned char>;
+//using ustring = std::basic_string<unsigned char>;
+using ustring = std::vector<unsigned char>;
 
 constexpr uint16_t kNumChar = 256;

I found valuable help here and here. This is where I learned that you need to use -DLIBCXX_ENABLE_VENDOR_AVAILABILITY_ANNOTATIONS=ON.

I'll be trying to find the bug in LLVM 21. It may take a while.

FYI glandium, seems to be a bug in llvm 21 in case that affects any of your plans.

Blocks: clang-21

Thanks for the investigation. We have now pinned LLVM 20 for aarch64-darwin in nixpkgs and are watching this issue.

Severity: -- → S3
Summary: Crashes [@ nsPNGEncoder::ConvertHostARGBRow ] on macOS, mostly on FF 144, possibly on a third-party release → Crashes [@ nsPNGEncoder::ConvertHostARGBRow ] on macOS, mostly on FF 144, on a third-party release (a Nix Package)
Summary: Crashes [@ nsPNGEncoder::ConvertHostARGBRow ] on macOS, mostly on FF 144, on a third-party release (a Nix Package) → Crashes [@ nsPNGEncoder::ConvertHostARGBRow ] on macOS, LLVM 21, on a third-party release (a Nix Package)

I've found the LLVM commit that caused these crashes, and have commented there.

I've been trying to write a reduced testcase, so far without any success. Once I've managed it, I'll open an issue at the LLVM Project.

For those who are curious, here are two patches, each of which works around the LLVM 21 bug I found. These apply to the release/21.x branch.

diff --git a/llvm/lib/CodeGen/MachineCopyPropagation.cpp b/llvm/lib/CodeGen/MachineCopyPropagation.cpp
index 742de1101faa..f6a6c9ef9d19 100644
--- a/llvm/lib/CodeGen/MachineCopyPropagation.cpp
+++ b/llvm/lib/CodeGen/MachineCopyPropagation.cpp
@@ -942,7 +942,9 @@ void MachineCopyPropagation::ForwardCopyPropagateBlock(MachineBasicBlock &MBB) {
       // are the same and are not referring to a reserved register). If so,
       // delete it.
       if (RegSrc == RegDef && !MRI->isReserved(RegSrc)) {
+#if (0)
         MI.eraseFromParent();
+#endif
         NumDeletes++;
         Changed = true;
         continue;
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp b/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
index 5420545cc3ce..65e4c0f11373 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
@@ -10057,6 +10057,7 @@ AArch64InstrInfo::isCopyInstrImpl(const MachineInstr &MI) const {
 
   // AArch64::ORRWrs and AArch64::ORRXrs with WZR/XZR reg
   // and zero immediate operands used as an alias for mov instruction.
+#if (0)
   if (((MI.getOpcode() == AArch64::ORRWrs &&
         MI.getOperand(1).getReg() == AArch64::WZR &&
         MI.getOperand(3).getImm() == 0x0) ||
@@ -10069,6 +10070,7 @@ AArch64InstrInfo::isCopyInstrImpl(const MachineInstr &MI) const {
        MI.findRegisterDefOperandIdx(getXRegFromWReg(MI.getOperand(0).getReg()),
                                     /*TRI=*/nullptr) == -1))
     return DestSourcePair{MI.getOperand(0), MI.getOperand(2)};
+#endif
 
   if (MI.getOpcode() == AArch64::ORRXrs &&
       MI.getOperand(1).getReg() == AArch64::XZR &&

And here's a debug logging patch that I've been using:

diff --git a/llvm/lib/CodeGen/MachineCopyPropagation.cpp b/llvm/lib/CodeGen/MachineCopyPropagation.cpp
index 742de1101faa..ca3d3c5dc433 100644
--- a/llvm/lib/CodeGen/MachineCopyPropagation.cpp
+++ b/llvm/lib/CodeGen/MachineCopyPropagation.cpp
@@ -450,10 +450,18 @@ public:
   }
 };
 
+// This debug logging uses https://github.com/steven-michaud/PySerialPortLogger.
+// Install it, then run Terminal and open three tabs. Then run serialportlogger
+// in its third tab.
+#define BUGZILLA_1995582_DEBUG_LOG 1
+
 class MachineCopyPropagation {
   const TargetRegisterInfo *TRI = nullptr;
   const TargetInstrInfo *TII = nullptr;
   const MachineRegisterInfo *MRI = nullptr;
+#ifdef BUGZILLA_1995582_DEBUG_LOG
+  const MachineFunction *MF_ = nullptr;
+#endif
 
   // Return true if this is a copy instruction and false otherwise.
   bool UseCopyInstr;
@@ -874,6 +882,31 @@ void MachineCopyPropagation::forwardUses(MachineInstr &MI) {
   }
 }
 
+#ifdef BUGZILLA_1995582_DEBUG_LOG
+#include <stdarg.h>
+#include <stdio.h>
+#include <fcntl.h>
+#include <termios.h>
+
+#define VIRTUAL_SERIAL_PORT "/dev/ttys003"
+bool g_virtual_serial_checked = false;
+int g_virtual_serial = -1;
+std::unique_ptr<raw_fd_ostream> g_virtual_serial_stream;
+
+static void maybe_initialize_tty()
+{
+  if (!g_virtual_serial_checked) {
+    g_virtual_serial_checked = true;
+    g_virtual_serial =
+      open(VIRTUAL_SERIAL_PORT, O_WRONLY | O_NONBLOCK | O_NOCTTY);
+    if (g_virtual_serial >= 0) {
+      g_virtual_serial_stream =
+        std::make_unique<raw_fd_ostream>(g_virtual_serial, false, true);
+    }
+  }
+}
+#endif
+
 void MachineCopyPropagation::ForwardCopyPropagateBlock(MachineBasicBlock &MBB) {
   LLVM_DEBUG(dbgs() << "MCP: ForwardCopyPropagateBlock " << MBB.getName()
                     << "\n");
@@ -942,6 +975,17 @@ void MachineCopyPropagation::ForwardCopyPropagateBlock(MachineBasicBlock &MBB) {
       // are the same and are not referring to a reserved register). If so,
       // delete it.
       if (RegSrc == RegDef && !MRI->isReserved(RegSrc)) {
+#ifdef BUGZILLA_1995582_DEBUG_LOG
+        maybe_initialize_tty();
+        *g_virtual_serial_stream << "******\n";
+        *g_virtual_serial_stream << "MachineInstr::eraseFromParent(UseCopyInstr " << UseCopyInstr << "):\n\n";
+        MI.print(*g_virtual_serial_stream);
+        *g_virtual_serial_stream << "\n";
+        tcdrain(g_virtual_serial);
+        MF_->print(*g_virtual_serial_stream);
+        tcdrain(g_virtual_serial);
+        *g_virtual_serial_stream << "******\n";
+#endif
         MI.eraseFromParent();
         NumDeletes++;
         Changed = true;
@@ -1616,6 +1660,9 @@ bool MachineCopyPropagation::run(MachineFunction &MF) {
   TRI = MF.getSubtarget().getRegisterInfo();
   TII = MF.getSubtarget().getInstrInfo();
   MRI = &MF.getRegInfo();
+#ifdef BUGZILLA_1995582_DEBUG_LOG
+  MF_ = &MF;
+#endif
 
   for (MachineBasicBlock &MBB : MF) {
     if (isSpillageCopyElimEnabled)

I'm still trying to create a reduced test case for the LLVM 21 bug introduced by https://github.com/llvm/llvm-project/pull/129889. It's not going to be easy.

But I did find an easy way for Mozilla to work around this bug: Just add -mllvm -aarch64-enable-copy-propagation=false to CPPFLAGS. Note that this flag has no effect unless you're using -O3 optimization, and that the crashes don't happen at lower levels of optimization (or with no optimization).

I just discovered something interesting: Building Firefox with -O3 -flto (and using LLVM 21 tools) disables copy propagation. The builds take more than twice as long as "normal" builds, but they aren't effected by the copy propagation bug. -flto turns on "link time optimization".

I've given up on writing a reduced testcase, at least for the time being. But I've submitted two lldb sessions here, a "good" one and a "bad" one. Between them they demonstrate how LLVM 21's copy propagation causes this bug's crashes.

The llvm-project is going to back out the copy propagation optimization that caused these crashes. I'll test it once the patch reaches the release/21.x branch.

(In reply to Steven Michaud [:smichaud] (Retired) from comment #24)

The llvm-project is going to back out the copy propagation optimization that caused these crashes. I'll test it once the patch reaches the release/21.x branch.

This just landed on the release/21.x branch. I tested with it and had no problems. If I understand correctly, the first release containing this patch will be 21.1.6. Given the frequency of past releases, it should come out in a few weeks.

LLVM 21.1.6 was just released, and there's a macOS ARM64 build among its "assets". But it was built incorrectly (without -DLIBCXX_ENABLE_VENDOR_AVAILABILITY_ANNOTATIONS=ON), and so using it to build Firefox triggers this bug (see also).

It's assets don't yet include a source distro (e.g. llvm-project-21.1.6.src.tar.xz). So I'm not yet able to do a local build.

(In reply to Steven Michaud [:smichaud] (Retired) from comment #26)

It's assets don't yet include a source distro (e.g. llvm-project-21.1.6.src.tar.xz). So I'm not yet able to do a local build.

Actually it does, here. I did a local build of this (with -DLIBCXX_ENABLE_VENDOR_AVAILABILITY_ANNOTATIONS=ON) and had no trouble with it -- either building Firefox (from current trunk code) or running it afterwards.

So maybe we should close this bug as fixed by the LLVM 21.1.6 release. But I think we should wait until the LLVM Project has a macOS ARM64 build that works properly. I reported the problem here. I'm not sure if I'll also need to open a separate issue on it.

For all intents and purposes for Mozilla, this bug is fixed and won't affect builds using the clang toolchain bootstrapped by the build system.

Status: NEW → RESOLVED
Closed: 27 days ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.