Closed Bug 1914446 Opened 10 months ago Closed 10 months ago

PGO with sandboxing fails in Arch Linux build containers

Categories

(Firefox Build System :: General, defect, P2)

Desktop
Linux
defect

Tracking

(firefox-esr115 unaffected, firefox-esr128 unaffected, firefox129 unaffected, firefox130 unaffected, firefox131 fixed)

RESOLVED FIXED
131 Branch
Tracking Status
firefox-esr115 --- unaffected
firefox-esr128 --- unaffected
firefox129 --- unaffected
firefox130 --- unaffected
firefox131 --- fixed

People

(Reporter: heftig, Assigned: gerard-majax)

References

(Regression)

Details

(Keywords: regression)

Attachments

(2 files)

Arch Linux is building Firefox with PGO, see https://pkgbuild.com/~heftig/packages/firefox-nightly/. This recently started failing.

One issue is that the sandboxed processes are crashing (signal 11/SIGSEGV). Presumably the sandbox does not like our systemd-nspawn-run build containers.

We worked around this by defining the sandbox-disabling environment again:

MOZ_DISABLE_CONTENT_SANDBOX=1
MOZ_DISABLE_GMP_SANDBOX=1
MOZ_DISABLE_GPU_SANDBOX=1
MOZ_DISABLE_RDD_SANDBOX=1
MOZ_DISABLE_SOCKET_PROCESS_SANDBOX=1
MOZ_DISABLE_UTILITY_SANDBOX=1
MOZ_DISABLE_VR_SANDBOX=1

Set release status flags based on info from the regressing bug 1553850

:gerard-majax, since you are the author of the regressor, bug 1553850, could you take a look? Also, could you set the severity field?

For more information, please visit BugBot documentation.

I'm not working until next Monday, and without a crash it's going to be complicated. If it's really the sandbox running with MOZ_SANDBOX_LOGGING=1 during pgo and looking at the log should get the offending syscall. I would have expected sigsys though so if it's really the sandbox we might be crashing later.

Run without any sandbox then try to find if one specific is triggering or if all are ?

Flags: needinfo?(lissyx+mozillians)

:gerard-majax do you have a priority/severity for this? Setting this as fix optional for fx131 since it only affects sandboxed processes

Flags: needinfo?(lissyx+mozillians)

It's only failing on archlinux builds, and without more diagnosing data it's hard to assert.

Flags: needinfo?(lissyx+mozillians)

Jan, there's no log on the package link, there's nothing i can do ...

Flags: needinfo?(jan.steffens)
Severity: -- → S4
Priority: -- → P4

i'm wondering if snap build might be hitting the same ?

from https://launchpadlibrarian.net/746294945/buildlog_snap_ubuntu_jammy_amd64_firefox-snap-nightly_BUILDING.txt.gz:

:: [Parent 66876, IPC I/O Parent] WARNING: process 67030 exited on signal 11: file /build/firefox/parts/firefox/build/ipc/chromium/src/base/process_util_posix.cc:335
:: [Parent 66876, IPC I/O Parent] WARNING: process 66987 exited on signal 11: file /build/firefox/parts/firefox/build/ipc/chromium/src/base/process_util_posix.cc:335
:: console.error: ({})
:: LLVM Profile Error: Failed to write file "/build/firefox/parts/firefox/build/obj-x86_64-pc-linux-gnu/instrumented/default_67100_random_18405209413954990729_0.profraw": No such file or directory
:: LLVM Profile Error: Failed to write file "/build/firefox/parts/firefox/build/obj-x86_64-pc-linux-gnu/instrumented/default_67100_random_18405209413954990729_0.profraw": No such file or directory
:: LLVM Profile Error: Failed to write file "/build/firefox/parts/firefox/build/obj-x86_64-pc-linux-gnu/instrumented/default_67100_random_18363079133201007874_0.profraw": No such file or directory
:: LLVM Profile Error: Failed to write file "/build/firefox/parts/firefox/build/obj-x86_64-pc-linux-gnu/instrumented/default_67100_random_7386526750823653679_0.profraw": No such file or directory
:: LLVM Profile Error: Failed to write file "/build/firefox/parts/firefox/build/obj-x86_64-pc-linux-gnu/instrumented/default_67100_random_1971493760237642638_0.profraw": No such file or directory
:: LLVM Profile Error: Failed to write file "/build/firefox/parts/firefox/build/obj-x86_64-pc-linux-gnu/instrumented/default_67100_random_18405209413954990729_0.profraw": No such file or directory
:: LLVM Profile Error: Failed to write file "/build/firefox/parts/firefox/build/obj-x86_64-pc-linux-gnu/instrumented/default_67100_random_3427965757053527218_0.profraw": No such file or directory
:: LLVM Profile Error: Failed to write file "/build/firefox/parts/firefox/build/obj-x86_64-pc-linux-gnu/instrumented/default_67100_random_18405209413954990729_0.profraw": No such file or directory
:: LLVM Profile Error: Failed to write file "/build/firefox/parts/firefox/build/obj-x86_64-pc-linux-gnu/instrumented/default_67100_random_18405209413954990729_0.profraw": No such file or directory
:: LLVM Profile Error: Failed to write file "/build/firefox/parts/firefox/build/obj-x86_64-pc-linux-gnu/instrumented/default_67100_random_18405209413954990729_0.profraw": No such file or directory
:: LLVM Profile Error: Failed to write file "/build/firefox/parts/firefox/build/obj-x86_64-pc-linux-gnu/instrumented/default_67100_random_1680655296004190972_0.profraw": No such file or directory
:: LLVM Profile Error: Failed to write file "/build/firefox/parts/firefox/build/obj-x86_64-pc-linux-gnu/instrumented/default_67100_random_8052597566144741601_0.profraw": No such file or directory
:: LLVM Profile Error: Failed to write file "/build/firefox/parts/firefox/build/obj-x86_64-pc-linux-gnu/instrumented/default_67100_random_5955242583521650425_0.profraw": No such file or directory
:: LLVM Profile Error: Failed to write file "/build/firefox/parts/firefox/build/obj-x86_64-pc-linux-gnu/instrumented/default_67100_random_11839711634931554538_0.profraw": No such file or directory
:: LLVM Profile Error: Failed to write file "/build/firefox/parts/firefox/build/obj-x86_64-pc-linux-gnu/instrumented/default_67100_random_18260465015243514312_0.profraw": No such file or directory
:: LLVM Profile Error: Failed to write file "/build/firefox/parts/firefox/build/obj-x86_64-pc-linux-gnu/instrumented/default_67100_random_16345446069847510270_0.profraw": No such file or directory
:: LLVM Profile Error: Failed to write file "/build/firefox/parts/firefox/build/obj-x86_64-pc-linux-gnu/instrumented/default_67100_random_1138732080130610960_0.profraw": No such file or directory
:: [ERROR error_support::handling] suggest-unexpected: Error from Remote Settings: Error parsing URL: relative URL with a cannot-be-a-base base
:: console.error: URLBar - QuickSuggest.SuggestBackendRust: "Ingest error for Amo: Error from Remote Settings: Error parsing URL: relative URL with a cannot-be-a-base base"
:: [ERROR error_support::handling] suggest-unexpected: Error from Remote Settings: Error parsing URL: relative URL with a cannot-be-a-base base
:: console.error: URLBar - QuickSuggest.SuggestBackendRust: "Ingest error for Amp: Error from Remote Settings: Error parsing URL: relative URL with a cannot-be-a-base base"
:: [ERROR error_support::handling] suggest-unexpected: Error from Remote Settings: Error parsing URL: relative URL with a cannot-be-a-base base
:: console.error: URLBar - QuickSuggest.SuggestBackendRust: "Ingest error for Wikipedia: Error from Remote Settings: Error parsing URL: relative URL with a cannot-be-a-base base"
:: [ERROR error_support::handling] suggest-unexpected: Error from Remote Settings: Error parsing URL: relative URL with a cannot-be-a-base base
:: console.error: URLBar - QuickSuggest.SuggestBackendRust: "Ingest error for Mdn: Error from Remote Settings: Error parsing URL: relative URL with a cannot-be-a-base base"
:: [ERROR error_support::handling] suggest-unexpected: Error from Remote Settings: Error parsing URL: relative URL with a cannot-be-a-base base
:: console.error: URLBar - QuickSuggest.SuggestBackendRust: "Ingest error for Yelp: Error from Remote Settings: Error parsing URL: relative URL with a cannot-be-a-base base"
:: LLVM Profile Error: Failed to write file "/build/firefox/parts/firefox/build/obj-x86_64-pc-linux-gnu/instrumented/default_67094_random_18363079133201007874_0.profraw": No such file or directory
:: LLVM Profile Error: Failed to write file "/build/firefox/parts/firefox/build/obj-x86_64-pc-linux-gnu/instrumented/default_67094_random_7386526750823653679_0.profraw": No such file or directory
:: LLVM Profile Error: Failed to write file "/build/firefox/parts/firefox/build/obj-x86_64-pc-linux-gnu/instrumented/default_67094_random_1971493760237642638_0.profraw": No such file or directory
:: LLVM Profile Error: Failed to write file "/build/firefox/parts/firefox/build/obj-x86_64-pc-linux-gnu/instrumented/default_67094_random_18405209413954990729_0.profraw": No such file or directory
:: LLVM Profile Error: Failed to write file "/build/firefox/parts/firefox/build/obj-x86_64-pc-linux-gnu/instrumented/default_67094_random_3427965757053527218_0.profraw": No such file or directory
:: LLVM Profile Error: Failed to write file "/build/firefox/parts/firefox/build/obj-x86_64-pc-linux-gnu/instrumented/default_67094_random_18405209413954990729_0.profraw": No such file or directory
:: LLVM Profile Error: Failed to write file "/build/firefox/parts/firefox/build/obj-x86_64-pc-linux-gnu/instrumented/default_67094_random_18405209413954990729_0.profraw": No such file or directory
:: LLVM Profile Error: Failed to write file "/build/firefox/parts/firefox/build/obj-x86_64-pc-linux-gnu/instrumented/default_67094_random_18405209413954990729_0.profraw": No such file or directory
:: LLVM Profile Error: Failed to write file "/build/firefox/parts/firefox/build/obj-x86_64-pc-linux-gnu/instrumented/default_67094_random_1680655296004190972_0.profraw": No such file or directory
:: LLVM Profile Error: Failed to write file "/build/firefox/parts/firefox/build/obj-x86_64-pc-linux-gnu/instrumented/default_67094_random_8052597566144741601_0.profraw": No such file or directory
:: LLVM Profile Error: Failed to write file "/build/firefox/parts/firefox/build/obj-x86_64-pc-linux-gnu/instrumented/default_67094_random_5955242583521650425_0.profraw": No such file or directory
:: LLVM Profile Error: Failed to write file "/build/firefox/parts/firefox/build/obj-x86_64-pc-linux-gnu/instrumented/default_67094_random_11839711634931554538_0.profraw": No such file or directory
:: LLVM Profile Error: Failed to write file "/build/firefox/parts/firefox/build/obj-x86_64-pc-linux-gnu/instrumented/default_67094_random_18260465015243514312_0.profraw": No such file or directory
:: LLVM Profile Error: Failed to write file "/build/firefox/parts/firefox/build/obj-x86_64-pc-linux-gnu/instrumented/default_67094_random_16345446069847510270_0.profraw": No such file or directory
:: LLVM Profile Error: Failed to write file "/build/firefox/parts/firefox/build/obj-x86_64-pc-linux-gnu/instrumented/default_67094_random_1138732080130610960_0.profraw": No such file or directory
:: LLVM Profile Error: Failed to write file "/build/firefox/parts/firefox/build/obj-x86_64-pc-linux-gnu/instrumented/default_66967_random_1500142544453982149_0.profraw": No such file or directory
:: LLVM Profile Error: Failed to write file "/build/firefox/parts/firefox/build/obj-x86_64-pc-linux-gnu/instrumented/default_66967_random_18405209413954990729_0.profraw": No such file or directory
:: LLVM Profile Error: Failed to write file "/build/firefox/parts/firefox/build/obj-x86_64-pc-linux-gnu/instrumented/default_66967_random_18405209413954990729_0.profraw": No such file or directory
:: LLVM Profile Error: Failed to write file "/build/firefox/parts/firefox/build/obj-x86_64-pc-linux-gnu/instrumented/default_66967_random_18363079133201007874_0.profraw": No such file or directory
:: LLVM Profile Error: Failed to write file "/build/firefox/parts/firefox/build/obj-x86_64-pc-linux-gnu/instrumented/default_66967_random_7386526750823653679_0.profraw": No such file or directory
:: LLVM Profile Error: Failed to write file "/build/firefox/parts/firefox/build/obj-x86_64-pc-linux-gnu/instrumented/default_66967_random_1971493760237642638_0.profraw": No such file or directory
:: LLVM Profile Error: Failed to write file "/build/firefox/parts/firefox/build/obj-x86_64-pc-linux-gnu/instrumented/default_66967_random_18405209413954990729_0.profraw": No such file or directory
:: LLVM Profile Error: Failed to write file "/build/firefox/parts/firefox/build/obj-x86_64-pc-linux-gnu/instrumented/default_66967_random_3427965757053527218_0.profraw": No such file or directory
:: LLVM Profile Error: Failed to write file "/build/firefox/parts/firefox/build/obj-x86_64-pc-linux-gnu/instrumented/default_66967_random_18405209413954990729_0.profraw": No such file or directory
:: LLVM Profile Error: Failed to write file "/build/firefox/parts/firefox/build/obj-x86_64-pc-linux-gnu/instrumented/default_66967_random_18405209413954990729_0.profraw": No such file or directory
:: LLVM Profile Error: Failed to write file "/build/firefox/parts/firefox/build/obj-x86_64-pc-linux-gnu/instrumented/default_66967_random_18405209413954990729_0.profraw": No such file or directory
:: LLVM Profile Error: Failed to write file "/build/firefox/parts/firefox/build/obj-x86_64-pc-linux-gnu/instrumented/default_66967_random_1680655296004190972_0.profraw": No such file or directory
:: LLVM Profile Error: Failed to write file "/build/firefox/parts/firefox/build/obj-x86_64-pc-linux-gnu/instrumented/default_66967_random_8052597566144741601_0.profraw": No such file or directory
:: LLVM Profile Error: Failed to write file "/build/firefox/parts/firefox/build/obj-x86_64-pc-linux-gnu/instrumented/default_66967_random_5955242583521650425_0.profraw": No such file or directory
:: LLVM Profile Error: Failed to write file "/build/firefox/parts/firefox/build/obj-x86_64-pc-linux-gnu/instrumented/default_66967_random_11839711634931554538_0.profraw": No such file or directory
:: LLVM Profile Error: Failed to write file "/build/firefox/parts/firefox/build/obj-x86_64-pc-linux-gnu/instrumented/default_66967_random_18260465015243514312_0.profraw": No such file or directory
:: LLVM Profile Error: Failed to write file "/build/firefox/parts/firefox/build/obj-x86_64-pc-linux-gnu/instrumented/default_66967_random_16345446069847510270_0.profraw": No such file or directory
:: LLVM Profile Error: Failed to write file "/build/firefox/parts/firefox/build/obj-x86_64-pc-linux-gnu/instrumented/default_66967_random_1138732080130610960_0.profraw": No such file or directory
:: started SP3 server on port 8000
:: Firefox exited with code 245 during profile initialization

This does not repro on our CI ...

Amin, can you confirm? I dont have access to enough of the logs, but did it started to fail ~8 days ago ? When https://bugzilla.mozilla.org/show_bug.cgi?id=1553850 landed ?

Flags: needinfo?(bandali)

Sorry, I didn't have time to look at this lately, but the logs from the Snap build look much like what I remember.

I'll see if I manage to reproduce locally

I sadly don't readily have access to older snap build logs, but looking on https://snapcraft.io/firefox I see the last publish date for edge (Nightly) for amd64 was indeed on 20 August 2024. I'll see if I can get a more comprehensive history of build logs.

Flags: needinfo?(bandali)

Here's a log with MOZ_SANDBOX_LOGGING=1.

Flags: needinfo?(jan.steffens)

(In reply to Jan Alexander Steffens [:heftig] from comment #11)

Created attachment 9421326 [details]
firefox-nightly-131.0a1+20240828.1+h81b9ba82d9b8-1-x86_64-build.log

Here's a log with MOZ_SANDBOX_LOGGING=1.

Thanks looks similar. I have not been able to find the minidump yet on the snap builds, do you see it? it should be under minidumps in the profile: https://searchfox.org/mozilla-central/rev/45d6f8bf028e049f812aa26dced565d50068af5d/build/pgo/profileserver.py#32-59

There is no profile anywhere. I guess TemporaryDirectory just deletes it?

Indeed. I hacked a bit the snap build:

diff --git a/snapcraft.yaml b/snapcraft.yaml
index 2fbca94..79338bd 100644
--- a/snapcraft.yaml
+++ b/snapcraft.yaml
@@ -313,6 +313,7 @@ parts:
       - wasi-sdk
     build-packages:
       - cmake
+      - ccache
       - coreutils
       - file
       - git
@@ -416,6 +417,9 @@ parts:
         export LD_LIBRARY_PATH="$CRAFT_PART_BUILD/obj-$TARGET_TRIPLET/instrumented/dist/bin${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}"
       fi
       MACH="/usr/bin/python3 ./mach"
+      export MOZ_SANDBOX_LOGGING=1
+      export UPLOAD_PATH=$CRAFT_PROJECT_DIR/
+      export MOZ_FETCHES_DIR=$CRAFT_PROJECT_DIR/fetches/
       if [ $CRAFT_TARGET_ARCH = "amd64" ]; then
         # xvfb is only needed when doing a PGO-enabled build
         xvfb-run '--server-args=-screen 0 1920x1080x24' $MACH build --verbose -j$CRAFT_PARALLEL_BUILD_COUNT

and got:

2024-08-28 18:24:26.312 :: 2024-08-28 18:14:20.138 :: PROCESS-CRASH | unable to find a usable font (serif) [@ libxul.so + 0x000000000589f2c7] | Profile initialization
2024-08-28 18:24:26.312 :: 2024-08-28 18:14:20.138 :: Crash dump filename: /tmp/tmpfr92nba5/minidumps/1eb3ed44-d839-d230-1362-a322d2f8d50a.dmp
2024-08-28 18:24:26.312 :: 2024-08-28 18:14:20.138 :: Process type: content
2024-08-28 18:24:26.312 :: 2024-08-28 18:14:20.138 :: Process pid: 58230
2024-08-28 18:24:26.312 :: 2024-08-28 18:14:20.138 :: Mozilla crash reason: unable to find a usable font (serif)
2024-08-28 18:24:26.312 :: 2024-08-28 18:14:20.138 :: Operating system: Linux
2024-08-28 18:24:26.312 :: 2024-08-28 18:14:20.138 ::                   6.10.3-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.10.3-1 (2024-08-04)
2024-08-28 18:24:26.312 :: 2024-08-28 18:14:20.138 :: CPU: amd64
2024-08-28 18:24:26.312 :: 2024-08-28 18:14:20.138 ::      family 23 model 49 stepping 0
2024-08-28 18:24:26.312 :: 2024-08-28 18:14:20.138 ::      128 CPUs
2024-08-28 18:24:26.312 :: 2024-08-28 18:14:20.138 :: Linux Ubuntu 22.04 - jammy (Ubuntu 22.04 LTS)
2024-08-28 18:24:26.312 :: 2024-08-28 18:14:20.138 ::
2024-08-28 18:24:26.312 :: 2024-08-28 18:14:20.138 :: Crash reason:  SIGSEGV / SEGV_MAPERR
2024-08-28 18:24:26.312 :: 2024-08-28 18:14:20.138 :: Crash address: 0x0000000000000000
2024-08-28 18:24:26.312 :: 2024-08-28 18:14:20.138 :: Crashing instruction: `mov dword [0x0], 0x8af`
2024-08-28 18:24:26.312 :: 2024-08-28 18:14:20.138 :: Memory accessed by instruction:
2024-08-28 18:24:26.312 :: 2024-08-28 18:14:20.138 ::   0. Address: 0x0000000000000000
2024-08-28 18:24:26.312 :: 2024-08-28 18:14:20.139 ::      Size: 4
2024-08-28 18:24:26.312 :: 2024-08-28 18:14:20.139 :: Process uptime: not available
2024-08-28 18:24:26.312 :: 2024-08-28 18:14:20.139 ::
2024-08-28 18:24:26.312 :: 2024-08-28 18:14:20.139 :: Thread 0 WebExtensions (crashed)
Severity: S4 → S3
Priority: P4 → P2
Assignee: nobody → lissyx+mozillians

Hmm, I do see lines like this one in our buildlog:

[GFX1]: no fonts - init: 1 fonts: 12 loader: 0

Our build has these 11 font families installed:

# fc-list -v | grep family: | sort -u
	family: "Cantarell"(s)
	family: "FreeMono"(s)
	family: "FreeSans"(s)
	family: "FreeSerif"(s)
	family: "Source Code Pro"(s)
	family: "Source Code Pro"(s) "Source Code Pro Black"(s)
	family: "Source Code Pro"(s) "Source Code Pro ExtraLight"(s)
	family: "Source Code Pro"(s) "Source Code Pro Light"(s)
	family: "Source Code Pro"(s) "Source Code Pro Medium"(s)
	family: "Source Code Pro"(s) "Source Code Pro Semibold"(s)
	family: "SourceCodeVF"(s)

Which I guess fits with the 12 fonts reported, if you also add Twemoji Mozilla.

I can get a PGO build to finish with MOZ_DISABLE_CONTENT_SANDBOX=1 and I still got those font crashes, so maybe it is unrelated?

Ok, so sandbox enabled and changing https://searchfox.org/mozilla-central/rev/45d6f8bf028e049f812aa26dced565d50068af5d/security/sandbox/linux/SandboxFilter.cpp#873-878 it seems to not crash. It looks like the bypass of openat() triggers this ...

Jan, if you ever have builds logs (profiler-run-*.log) from a few days before and after august 20st, I'd like to see if you have those segfaults mentions of exited on signal 11 in the log. I see you are also using xvfb-run so I'm wondering here

Flags: needinfo?(jan.steffens)

fixed by backout of bug 1553850. Jan, you can remove your workaround

Status: NEW → RESOLVED
Closed: 10 months ago
Resolution: --- → FIXED

FTR a local PGO build with the backout still exposes the segfaults, so this is likely not related.

I got tricked by logs being not overwritten

Target Milestone: --- → 131 Branch

Unfortunately, we only keep keep logs for the latest Nightly build.

I don't have any segfaults in either the current Nightly or any of our past release builds dating back to March. However, our release builds do have LLVM Profile Errors, while Nightly does not.

Flags: needinfo?(jan.steffens)

(In reply to Jan Alexander Steffens [:heftig] from comment #22)

Unfortunately, we only keep keep logs for the latest Nightly build.

I don't have any segfaults in either the current Nightly or any of our past release builds dating back to March. However, our release builds do have LLVM Profile Errors, while Nightly does not.

No problem; as mentionned in bug 1553850 the problem was due to us chroot()ing, i can't explain why it was working ... Patches have been updated on that bug that should disable the chroot, and from my testing it was enough. Can you try on a build on your side?

Flags: needinfo?(jan.steffens)

Tested your current patchset; profiling worked and the profile was of a reasonable size (106,000,296 bytes) . Thanks!

Flags: needinfo?(jan.steffens)
Attached file profiling log

Thanks for checking!

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: