Closed Bug 1869010 Opened 10 months ago Closed 9 months ago

Snap nightly 122 build with MOZ_PGO fails

Categories

(Firefox Build System :: Third Party Packaging, defect)

Firefox 122
Desktop
Linux
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bandali, Assigned: gerard-majax)

References

(Blocks 1 open bug)

Details

Attachments

(3 files)

Recently, Firefox 122 nightly snap with MOZ_PGO=1 has been failing to build with errors like these:

:: 76:39.46 gmake: Leaving directory '/build/firefox/parts/firefox/build/obj-x86_64-pc-linux-gnu/instrumented'
:: console.error: (new Error("Unexpected content-type \"text/plain;charset=US-ASCII\"", "resource://services-settings/Utils.sys.mjs", 406))
:: [GFX1-]: glxtest: libpci missing
:: [GFX1-]: glxtest: libEGL no display
:: console.error: (new Error("Unexpected content-type \"text/plain;charset=US-ASCII\"", "resource://services-settings/Utils.sys.mjs", 406))
:: console.error: (new Error("Unexpected content-type \"text/plain;charset=US-ASCII\"", "resource://services-settings/Utils.sys.mjs", 406))
:: console.error: (new Error("Unexpected content-type \"text/plain;charset=US-ASCII\"", "resource://services-settings/Utils.sys.mjs", 406))
:: console.error: (new Error("Unexpected content-type \"text/plain;charset=US-ASCII\"", "resource://services-settings/Utils.sys.mjs", 406))
:: console.error: ({})
:: console.error: (new Error("Unexpected content-type \"text/plain;charset=US-ASCII\"", "resource://services-settings/Utils.sys.mjs", 406))
:: console.error: (new Error("Unexpected content-type \"text/plain;charset=US-ASCII\"", "resource://services-settings/Utils.sys.mjs", 406))
:: console.error: (new Error("Unexpected content-type \"text/plain;charset=US-ASCII\"", "resource://services-settings/Utils.sys.mjs", 406))
:: console.error: (new Error("Unexpected content-type \"text/plain;charset=US-ASCII\"", "resource://services-settings/Utils.sys.mjs", 406))
:: console.error: (new Error("Unexpected content-type \"text/plain;charset=US-ASCII\"", "resource://services-settings/Utils.sys.mjs", 406))
:: console.error: (new Error("Unexpected content-type \"text/plain;charset=US-ASCII\"", "resource://services-settings/Utils.sys.mjs", 406))
:: console.error: (new Error("Unexpected content-type \"text/plain;charset=US-ASCII\"", "resource://services-settings/Utils.sys.mjs", 406))
:: console.error: (new Error("Unexpected content-type \"text/plain;charset=US-ASCII\"", "resource://services-settings/Utils.sys.mjs", 406))
:: console.error: (new Error("Unexpected content-type \"text/plain;charset=US-ASCII\"", "resource://services-settings/Utils.sys.mjs", 406))
:: console.error: (new Error("Unexpected content-type \"text/plain;charset=US-ASCII\"", "resource://services-settings/Utils.sys.mjs", 406))
:: console.error: (new Error("Unexpected content-type \"text/plain;charset=US-ASCII\"", "resource://services-settings/Utils.sys.mjs", 406))
:: console.error: (new Error("Unexpected content-type \"text/plain;charset=US-ASCII\"", "resource://services-settings/Utils.sys.mjs", 406))
:: console.error: (new Error("Unexpected content-type \"text/plain;charset=US-ASCII\"", "resource://services-settings/Utils.sys.mjs", 406))
:: console.error: ({})
:: LLVM Profile Error: Failed to write file "default_75355_random_18026880182405283074_0.profraw": Broken pipe
:: LLVM Profile Error: Failed to write file "default_75355_random_7386420086373065007_0.profraw": Broken pipe
:: LLVM Profile Error: Failed to write file "default_75355_random_1971493760237642638_0.profraw": Broken pipe
:: LLVM Profile Error: Failed to write file "default_75355_random_3427444493993907378_0.profraw": Broken pipe
:: LLVM Profile Error: Failed to write file "default_75355_random_1680651968477229820_0.profraw": Broken pipe
:: LLVM Profile Error: Failed to write file "default_75355_random_1852370830671597793_0.profraw": Broken pipe
:: LLVM Profile Error: Failed to write file "default_75355_random_8968776140486091983_0.profraw": Broken pipe
:: LLVM Profile Error: Failed to write file "default_75355_random_5689700494317048042_0.profraw": Broken pipe
:: LLVM Profile Error: Failed to write file "default_75355_random_18260465015243514312_0.profraw": Broken pipe
:: LLVM Profile Error: Failed to write file "default_75355_random_16345446069847510270_0.profraw": Broken pipe
:: LLVM Profile Error: Failed to write file "default_75355_random_1138732080130610960_0.profraw": Broken pipe
:: [GFX1-]: glxtest: libpci missing
:: [GFX1-]: glxtest: libEGL no display
:: console.error: (new Error("Unexpected content-type \"text/plain;charset=US-ASCII\"", "resource://services-settings/Utils.sys.mjs", 406))
:: console.error: (new Error("Unexpected content-type \"text/plain;charset=US-ASCII\"", "resource://services-settings/Utils.sys.mjs", 406))
:: console.error: (new Error("Unexpected content-type \"text/plain;charset=US-ASCII\"", "resource://services-settings/Utils.sys.mjs", 406))
:: console.error: (new Error("Unexpected content-type \"text/plain;charset=US-ASCII\"", "resource://services-settings/Utils.sys.mjs", 406))
:: console.error: (new Error("Unexpected content-type \"text/plain;charset=US-ASCII\"", "resource://services-settings/Utils.sys.mjs", 406))
:: console.error: ({})
:: console.error: (new Error("Unexpected content-type \"text/plain;charset=US-ASCII\"", "resource://services-settings/Utils.sys.mjs", 406))
:: console.error: (new Error("Unexpected content-type \"text/plain;charset=US-ASCII\"", "resource://services-settings/Utils.sys.mjs", 406))
:: console.error: (new Error("Unexpected content-type \"text/plain;charset=US-ASCII\"", "resource://services-settings/Utils.sys.mjs", 406))
:: console.error: (new Error("Unexpected content-type \"text/plain;charset=US-ASCII\"", "resource://services-settings/Utils.sys.mjs", 406))
:: console.error: (new Error("Unexpected content-type \"text/plain;charset=US-ASCII\"", "resource://services-settings/Utils.sys.mjs", 406))
:: console.error: (new TypeError("can't access property \"find\", this._searchProviderInfo is null", "resource:///modules/SearchSERPTelemetry.sys.mjs", 743))
:: console.error: (new Error("Unexpected content-type \"text/plain;charset=US-ASCII\"", "resource://services-settings/Utils.sys.mjs", 406))
:: console.error: (new Error("Unexpected content-type \"text/plain;charset=US-ASCII\"", "resource://services-settings/Utils.sys.mjs", 406))
:: console.error: (new Error("Unexpected content-type \"text/plain;charset=US-ASCII\"", "resource://services-settings/Utils.sys.mjs", 406))
:: console.error: (new Error("Unexpected content-type \"text/plain;charset=US-ASCII\"", "resource://services-settings/Utils.sys.mjs", 406))
:: JavaScript error: resource://gre/modules/XULStore.sys.mjs, line 60: Error: Can't find profile directory.
:: JavaScript error: resource://gre/modules/XULStore.sys.mjs, line 60: Error: Can't find profile directory.
:: JavaScript error: resource://gre/modules/XULStore.sys.mjs, line 60: Error: Can't find profile directory.
:: JavaScript error: resource://gre/modules/XULStore.sys.mjs, line 60: Error: Can't find profile directory.
:: JavaScript error: resource://gre/modules/XULStore.sys.mjs, line 60: Error: Can't find profile directory.
:: JavaScript error: resource://gre/modules/XULStore.sys.mjs, line 60: Error: Can't find profile directory.
:: JavaScript error: resource://gre/modules/XULStore.sys.mjs, line 60: Error: Can't find profile directory.
:: console.error: (new Error("Unexpected content-type \"text/plain;charset=US-ASCII\"", "resource://services-settings/Utils.sys.mjs", 406))
:: console.error: (new Error("Unexpected content-type \"text/plain;charset=US-ASCII\"", "resource://services-settings/Utils.sys.mjs", 406))
:: console.error: (new Error("Unexpected content-type \"text/plain;charset=US-ASCII\"", "resource://services-settings/Utils.sys.mjs", 406))
:: console.error: (new Error("Unexpected content-type \"text/plain;charset=US-ASCII\"", "resource://services-settings/Utils.sys.mjs", 406))
:: JavaScript error: resource://gre/modules/XULStore.sys.mjs, line 60: Error: Can't find profile directory.
:: console.error: (new Error("Polling for changes failed: Unexpected content-type \"text/plain;charset=US-ASCII\".", "resource://services-settings/remote-settings.sys.mjs", 324))
:: JavaScript error: resource://gre/modules/XULStore.sys.mjs, line 60: Error: Can't find profile directory.
:: ExceptionHandler::WaitForContinueSignal waiting for continue signal...
:: ExceptionHandler::GenerateDump cloned child 76246
:: ExceptionHandler::SendContinueSignalToChild sent continue signal to child
:: Exiting due to channel error.
:: Exiting due to channel error.
:: Exiting due to channel error.
:: Exiting due to channel error.
:: Exiting due to channel error.
:: Exiting due to channel error.
:: Exiting due to channel error.
:: jarlog: /build/firefox/parts/firefox/build/obj-x86_64-pc-linux-gnu/jarlog/en-US.log
:: Firefox exited with code 11 during profiling
::  Config object not found by mach.
:: Configure complete!
:: Be sure to run |mach build| to pick up any changes
:: To view a profile of the build, run |mach resource-usage|.
:: To take your build for a test drive, run: |mach run|
:: For more information on what to do now, see https://firefox-source-docs.mozilla.org/setup/contributing_code.html
::   Parallelism determined by memory: using 4 jobs for 4 cores based on 15.6 GiB RAM and estimated job size of 1.0 GiB
:: Error running mach:
::
::     mach build --verbose -j4
::
:: The error occurred in code that was called by the mach command. This is either
:: a bug in the called code itself or in the way that mach is calling it.
:: You can invoke ``./mach busted`` to check if this issue is already on file. If it
:: isn't, please use ``./mach busted file build`` to report it. If ``./mach busted`` is
:: misbehaving, you can also inspect the dependencies of bug 1543241.
::
:: If filing a bug, please include the full output of mach, including this error
:: message.
::
:: The details of the failure are as follows:
::
:: subprocess.CalledProcessError: Command '['/build/firefox/parts/firefox/build/.mozbuild/srcdirs/build-8212568f7e0c/_virtualenvs/build/bin/python', '/build/firefox/parts/firefox/build/build/pgo/profileserver.py']' returned non-zero exit status 11.
::
::   File "/build/firefox/parts/firefox/build/python/mozbuild/mozbuild/build_commands.py", line 215, in build
::     subprocess.check_call(pgo_cmd, cwd=instr.topobjdir, env=pgo_env)
::   File "/usr/lib/python3.10/subprocess.py", line 369, in check_call
::     raise CalledProcessError(retcode, cmd)

The build succeeds if I comment out the addition of "ac_add_options MOZ_PGO=1" to the mozconfig. I haven't yet tried a non-snap build with MOZ_PGO=1, but I suspect the results would be similar, as this doesn't seem snap-specific. I'll try to verify that though.

Flags: needinfo?(mozilla)
Flags: needinfo?(lissyx+mozillians)

last time it was because of builds flags, unfortunately I dont have time right now

Flags: needinfo?(lissyx+mozillians)

please have a look at those flags, you might need to change linker and others: https://github.com/canonical/firefox-snap/commit/3511bc076d2ae3e467ef1eff5f77d9f0008d595d

Flags: needinfo?(bandali)

I'm going to re-run last mozilla-central cronjobs so we might get a clue when it broke, because playing with the linker settings did not prove any change:

for HG_COMMIT in 1d1fb9d5a4974d94e8c317f226c962fd351eae27 81869d55a8d7b3d1d573810ab4e99f63cbe11a8a e901b86e5d894b8b40093d72036208cf8837e098 8e959a7ded5f111e711c06a6728f76d3bd660699 76f22f29fcf9cd40a92a7bcf6b0a11cc782936b8 23ee4ac2d048de0aac3fa27ce7eb0925c1903096 ab110e2fcc6ab3c8e078fe970b109f482069386e bea12a00706c3b2951d467008996bf7e89d17e5b;
do
  (git checkout -b test_hg_${HG_COMMIT} $(git cinnabar hg2git 1d1fb9d5a4974d94e8c317f226c962fd351eae27);
  git cherry-pick 859bf24171fe1f0b0371174c90ec8e9488063e1a;
  git commit --amend --message "[SNAP] MOZ_PGO=1 on ${HG_COMMIT}";
  BUILD_DEBUG=1 ./mach try fuzzy --push-to-lando --full -q "'snap-upstream-build 'try");
done;
Flags: needinfo?(mozilla)
Flags: needinfo?(bandali)
Flags: needinfo?(bandali)

(In reply to :gerard-majax from comment #4)

Re-running latest green mozilla-central cron: https://treeherder.mozilla.org/jobs?repo=mozilla-central&searchStr=snap-upstream&revision=d6f61c448b906c1e68cdc66920d227f008cc2db9 results in the same failure https://treeherder.mozilla.org/jobs?repo=try&revision=ee4f4aa03a73a034f73c6b76ce9e4853a0d4be97

I am unsure it is a bug on our side ?

ok my try might not be working as expected, earlier versions should have failed on applying the patch, not on the segfault

Depends on: 1869228

(In reply to :gerard-majax from comment #3)

I'm going to re-run last mozilla-central cronjobs so we might get a clue when it broke, because playing with the linker settings did not prove any change:

for HG_COMMIT in 1d1fb9d5a4974d94e8c317f226c962fd351eae27 81869d55a8d7b3d1d573810ab4e99f63cbe11a8a e901b86e5d894b8b40093d72036208cf8837e098 8e959a7ded5f111e711c06a6728f76d3bd660699 76f22f29fcf9cd40a92a7bcf6b0a11cc782936b8 23ee4ac2d048de0aac3fa27ce7eb0925c1903096 ab110e2fcc6ab3c8e078fe970b109f482069386e bea12a00706c3b2951d467008996bf7e89d17e5b;
do
  (git checkout -b test_hg_${HG_COMMIT} $(git cinnabar hg2git 1d1fb9d5a4974d94e8c317f226c962fd351eae27);
  git cherry-pick 859bf24171fe1f0b0371174c90ec8e9488063e1a;
  git commit --amend --message "[SNAP] MOZ_PGO=1 on ${HG_COMMIT}";
  BUILD_DEBUG=1 ./mach try fuzzy --push-to-lando --full -q "'snap-upstream-build 'try");
done;

This was wrong since I hardcoded a value :(. I just noticed it after running a few local builds (still processing 81869d55a8d7b3d1d573810ab4e99f63cbe11a8a):

1d1fb9d5a4974d94e8c317f226c962fd351eae27 [OK]
81869d55a8d7b3d1d573810ab4e99f63cbe11a8a [...]
e901b86e5d894b8b40093d72036208cf8837e098 [FAIL] (rkv build failure)
8e959a7ded5f111e711c06a6728f76d3bd660699 [CRASH] 
76f22f29fcf9cd40a92a7bcf6b0a11cc782936b8 [CRASH] 
23ee4ac2d048de0aac3fa27ce7eb0925c1903096 [UNTESTED] 
ab110e2fcc6ab3c8e078fe970b109f482069386e [UNTESTED]
bea12a00706c3b2951d467008996bf7e89d17e5b [CRASH]

Correct script should be:

for HG_COMMIT in 1d1fb9d5a4974d94e8c317f226c962fd351eae27 81869d55a8d7b3d1d573810ab4e99f63cbe11a8a e901b86e5d894b8b40093d72036208cf8837e098 8e959a7ded5f111e711c06a6728f76d3bd660699 76f22f29fcf9cd40a92a7bcf6b0a11cc782936b8 23ee4ac2d048de0aac3fa27ce7eb0925c1903096 ab110e2fcc6ab3c8e078fe970b109f482069386e bea12a00706c3b2951d467008996bf7e89d17e5b;
do
  (git checkout -b test_hg_${HG_COMMIT} $(git cinnabar hg2git ${HG_COMMIT});
  git commit --amend --message "[SNAP] MOZ_PGO=1 on ${HG_COMMIT}";
  BUILD_DEBUG=1 ./mach try fuzzy --push-to-lando --full -q "'snap-upstream-build 'try");
done;

I'm getting inconsistent results, it might be needed to start over the bisection process. For sure https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=d6f61c448b906c1e68cdc66920d227f008cc2db9&tochange=6f5601bcf5ef9fa22b46e44e1d5e34893a04fc9e should cover it.

Latest braindump:

STR for local repro:

STR for try:

  • take m-c
diff --git a/taskcluster/docker/snap-coreXX-build/patches/nightly/re-add-pgo.patch b/taskcluster/docker/snap-coreXX-build/patches/nightly/re-add-pgo.patch
new file mode 100644
index 0000000000000..169b626fd05ad
--- /dev/null
+++ b/taskcluster/docker/snap-coreXX-build/patches/nightly/re-add-pgo.patch
@@ -0,0 +1,15 @@
+diff --git a/snapcraft.yaml b/snapcraft.yaml
+index 5ee8164..77d1529 100644
+--- a/snapcraft.yaml
++++ b/snapcraft.yaml
+@@ -366,9 +366,7 @@ parts:
+         # Linking with gold fails on armhf (error: undefined reference to '__aeabi_uldivmod') and would need to be
+         # investigated further, and running PGO on arm64 takes forever (> 4 days in the Launchpad build environment!).
+         echo "ac_add_options --enable-linker=lld" >> $MOZCONFIG
+-        # Temporarily disable MOZ_PGO due to build failures:
+-        # https://bugzilla.mozilla.org/1869010
+-        # echo "ac_add_options MOZ_PGO=1" >> $MOZCONFIG
++        echo "ac_add_options MOZ_PGO=1" >> $MOZCONFIG
+       fi
+       if [ $CRAFT_TARGET_ARCH != "armhf" ]; then
+         echo "ac_add_options --enable-rust-simd" >> $MOZCONFIG
  • ./mach try fuzzy --full -q "'snap 'build 'try 'opt"

Triggered for a few days ; can be monitored on https://treeherder.mozilla.org/jobs?repo=try&searchStr=snap%2Camd64%2Copt&author=alissy%40mozilla.com&fromchange=3d1e32cd02cf18b0e75b0da291dcacf23534c227

for HG_COMMIT in 31cf2cbea5939b2dc1639d5e011baee375fbf5f0 fd253d27e64ec2f0a321ef83bffbcf28392e9f88 b75cedb105e1b56ef1e3923c067100b6873ae53f 65a0553b296ae1432e5d18c3d2d607cacbb0a35e 556c8b22cbbbcc165c42b2c52ebdfd01be3cee6b 0ce8fcf63e627cf9d314fa8a471c3c2ad148ecd8 45383edb447de3f2958ed375a0b23b834aa60140 20cc1c7fc2f60de0739cc71c9d7bdc14ab12a55d a998d7d64649b68a9a7d81f3ea60c20da709fcc3 4028ba678039a3dd035da464b91632270e28a10b d56a4ca0375444e4e7ad2d4c1b1a2c2eca416b87 e7df49cb67e7b69c365956231cacc4ad6c9e2ef8 ac99cb1291098db9161c56ba4e184dd55edb0fb2 e6f456e40dda9f799c2fef84edc219d26631e268 bf2921de1f1d57eaf1a448425f112af0512c7612 0bd4f7d97fbb5edcc7b53fc0a05780c585109060 5d77812091445baad76c5d7a00d1ed102739c3c0 8493fb875960b426cb370c157c238bf6f607145f 407e629a0c77071d4f5bdc1560d8ce1530d5e3f6 cbfee6d8a0626dc4cea99ff66f3c41fff48c85ff 7d5aa4b996a1781850ab8f50a8846fa176288e26 69baaa4f31721fbc09b490c3cb1cdb465564e79e e901b86e5d894b8b40093d72036208cf8837e098 8e959a7ded5f111e711c06a6728f76d3bd660699 f198e511d598e9041ab3f540bf54ebe6103b88b1 ;
  do
  git checkout -b test_hg_${HG_COMMIT} $(git cinnabar hg2git ${HG_COMMIT})
  git cherry-pick 859bf24171fe1f0b0371174c90ec8e9488063e1a
  git commit --amend --message "[SNAP] MOZ_PGO=1 on ${HG_COMMIT}"
  ./mach try fuzzy --push-to-lando --full -q "'snap 'build 'try 'opt"
done;

Since some of the range has problems with applying the patch:

 for HG_COMMIT in d6f61c448b906c1e68cdc66920d227f008cc2db9 786805a2cdcc4cdab632a955dace417062d0087e 28a9211ed764d0c998dd928e8b529e0e2fbb8ab8 c04135b4f2515623b31c4b2db9eb5e971754f5be 7cab7eb61b23e5915c2907759cb7c408bca4741d 1dcc9b5526c21fb2c7cc5ff2e5aaebdaf3a0648c ee0c9d2da988bf1246030c879f2337feb02163e1 b75cedb105e1b56ef1e3923c067100b6873ae53f 65a0553b296ae1432e5d18c3d2d607cacbb0a35e 83540410d1e22cc835d53d6657dd072b4a166d7f 81b765410aebee2e0c7cc973fc2d4a10895da822 010108afa299455a2d724c66e05f399601b2b2f5 a4f3f7a4d4bc59486dac183a7f8ae08f2b7bd973 b33dd40c7ea956e89c10d4e8a54f9e869a431b64 cdf72549c9509d75b53bd706c4488907a8011655 1d1fb9d5a4974d94e8c317f226c962fd351eae27 ; do (git checkout -b test_hg_${HG_COMMIT} $(git cinnabar hg2git ${HG_COMMIT}); git cherry-pick bug1867699_try_new_nativemessaging_patch; git commit --amend --message "[SNAP] MOZ_PGO=1 on ${HG_COMMIT}"; BUILD_DEBUG=1 ./mach try fuzzy --push-to-lando --full -q "'snap 'build 'try 'opt"); done;

with:

commit 4f5c3b4910753876eeccdae374dfb9d3295b94aa (bug1867699_try_new_nativemessaging_patch)
Author: Alexandre Lissy <lissyx+mozillians@lissyx.dyndns.org>
Date:   Fri Dec 8 16:47:09 2023 +0100

    Bug 1867699 - Snap MOZ_PGO=1 no NativeMessaging patch

diff --git a/taskcluster/docker/snap-coreXX-build/patches/nightly/hack-link.patch b/taskcluster/docker/snap-coreXX-build/patches/nightly/hack-link.patch
new file mode 100644
index 0000000000000..beb87840e6a2a
--- /dev/null
+++ b/taskcluster/docker/snap-coreXX-build/patches/nightly/hack-link.patch
@@ -0,0 +1,15 @@
+diff --git a/snapcraft.yaml b/snapcraft.yaml
+index 5ee8164..2469303 100644
+--- a/snapcraft.yaml
++++ b/snapcraft.yaml
+@@ -366,9 +366,7 @@ parts:
+         # Linking with gold fails on armhf (error: undefined reference to '__aeabi_uldivmod') and would need to be
+         # investigated further, and running PGO on arm64 takes forever (> 4 days in the Launchpad build environment!).
+         echo "ac_add_options --enable-linker=lld" >> $MOZCONFIG
+-        # Temporarily disable MOZ_PGO due to build failures:
+-        # https://bugzilla.mozilla.org/1869010
+-        # echo "ac_add_options MOZ_PGO=1" >> $MOZCONFIG
++        echo "ac_add_options MOZ_PGO=1" >> $MOZCONFIG
+       fi
+       if [ $CRAFT_TARGET_ARCH != "armhf" ]; then
+         echo "ac_add_options --enable-rust-simd" >> $MOZCONFIG
diff --git a/taskcluster/docker/snap-coreXX-build/run.sh b/taskcluster/docker/snap-coreXX-build/run.sh
index 627fbd88013f4..30585ad3156f7 100755
--- a/taskcluster/docker/snap-coreXX-build/run.sh
+++ b/taskcluster/docker/snap-coreXX-build/run.sh
@@ -69,6 +69,9 @@ if [ "${TRY}" = "1" ]; then
   sed -ri 's|MOZ_SOURCE_CHANGESET=\$\{REVISION\}|MOZ_SOURCE_CHANGESET=${REVISION}|g' snapcraft.yaml
   # shellcheck disable=SC2016
   sed -ri 's|hg clone --stream \$REPO -u \$REVISION|cp -r \$SNAPCRAFT_PROJECT_DIR/gecko/. |g' snapcraft.yaml
+
+  rm patches/native-messaging-portal.patch
+  echo "" > patches/series
 fi

 if [ "${DEBUG}" = "1" ]; then

ok, so maybe the regression is from even earlier? https://hg.mozilla.org/mozilla-central/rev/d1558d374cef9d954a84b7db4646e48a4878f621 fails locally after a full clean + rebuild ...

last good build: https://treeherder.mozilla.org/logviewer?job_id=438216214&repo=mozilla-central&lineNumber=26792:

[task 2023-11-30T00:17:55.729Z] :: + REVISION=d6f61c448b906c1e68cdc66920d227f008cc2db9
[task 2023-11-30T00:17:55.729Z] :: + hg clone --stream https://hg.mozilla.org/mozilla-central -u d6f61c448b906c1e68cdc66920d227f008cc2db9 .

(In reply to :gerard-majax from comment #12)

ok, so maybe the regression is from even earlier? https://hg.mozilla.org/mozilla-central/rev/d1558d374cef9d954a84b7db4646e48a4878f621 fails locally after a full clean + rebuild ...

last good build: https://treeherder.mozilla.org/logviewer?job_id=438216214&repo=mozilla-central&lineNumber=26792:

[task 2023-11-30T00:17:55.729Z] :: + REVISION=d6f61c448b906c1e68cdc66920d227f008cc2db9
[task 2023-11-30T00:17:55.729Z] :: + hg clone --stream https://hg.mozilla.org/mozilla-central -u d6f61c448b906c1e68cdc66920d227f008cc2db9 .

locally even this https://hg.mozilla.org/mozilla-central/rev/d6f61c448b906c1e68cdc66920d227f008cc2db9 is now failing

Amin, builds that were working are now failing, can we start considering this is outside of firefox ?

Flags: needinfo?(bandali)
Attached file snapcraft-log.zip

snapcraft snap was updated in between of our failure:

  • last docker image built on nov 15:
7 matches
+ curl -L -H 'Snap-CDN: none' https://api.snapcraft.io/api/v1/snaps/download/99T7MUlRhtI3U0QFgl5mXXESAiSwt776_16202.snap --output core.snap
+ curl -L -H 'Snap-CDN: none' https://api.snapcraft.io/api/v1/snaps/download/DLqre5XGLbDqg9jPtiAhRRjDuPVa5X1q_2015.snap --output core20.snap
+ curl -L -H 'Snap-CDN: none' https://api.snapcraft.io/api/v1/snaps/download/vMTKRaLjnOJQetI78HjntT37VuoyssFE_9726.snap --output snapcraft.snap
+ curl -L -H 'Snap-CDN: none' https://api.snapcraft.io/api/v1/snaps/download/amcUKQILKXHHTlmSa7NMdnXSx02dNeeT_864.snap --output core22.snap
+ curl -L -H 'Snap-CDN: none' https://api.snapcraft.io/api/v1/snaps/download/lATO8HzwVvrAPrlZRAWpfyrJKlAJrZS3_141.snap --output gnome-42-2204.snap
+ curl -L -H 'Snap-CDN: none' https://api.snapcraft.io/api/v1/snaps/download/jZLfBRzf1cYlYysIjD2bwSzNtngY0qit_1535.snap --output gtk-common-themes.snap
+ curl -L -H 'Snap-CDN: none' https://api.snapcraft.io/api/v1/snaps/download/RrrVxM6A6RmcrRmwhCht4DbYdVILaTZy_230.snap --output gnome-42-2204-sdk.snap
  • re-built one on dec 8:
+ curl -L -H 'Snap-CDN: none' https://api.snapcraft.io/api/v1/snaps/download/99T7MUlRhtI3U0QFgl5mXXESAiSwt776_16202.snap --output core.snap
+ curl -L -H 'Snap-CDN: none' https://api.snapcraft.io/api/v1/snaps/download/DLqre5XGLbDqg9jPtiAhRRjDuPVa5X1q_2015.snap --output core20.snap
+ curl -L -H 'Snap-CDN: none' https://api.snapcraft.io/api/v1/snaps/download/vMTKRaLjnOJQetI78HjntT37VuoyssFE_10085.snap --output snapcraft.snap
+ curl -L -H 'Snap-CDN: none' https://api.snapcraft.io/api/v1/snaps/download/amcUKQILKXHHTlmSa7NMdnXSx02dNeeT_864.snap --output core22.snap
+ curl -L -H 'Snap-CDN: none' https://api.snapcraft.io/api/v1/snaps/download/lATO8HzwVvrAPrlZRAWpfyrJKlAJrZS3_141.snap --output gnome-42-2204.snap
+ curl -L -H 'Snap-CDN: none' https://api.snapcraft.io/api/v1/snaps/download/jZLfBRzf1cYlYysIjD2bwSzNtngY0qit_1535.snap --output gtk-common-themes.snap
+ curl -L -H 'Snap-CDN: none' https://api.snapcraft.io/api/v1/snaps/download/RrrVxM6A6RmcrRmwhCht4DbYdVILaTZy_230.snap --output gnome-42-2204-sdk.snap

Getting a crash report off the build is enlightening:

[task 2023-12-13T09:00:53.239Z] :: mozcrash checking /tmp/tmpiel8n6fs/minidumps for minidumps...
[task 2023-12-13T09:00:53.239Z] :: PROCESS-CRASH | None [@ libgallium_dri.so + 0x0000000000704fc0] | Profiling run
[task 2023-12-13T09:00:53.239Z] :: Crash dump filename: /tmp/tmpiel8n6fs/minidumps/494db993-edbe-c587-45e5-f338b6935c3e.dmp
[task 2023-12-13T09:00:53.239Z] :: Process type: main
[task 2023-12-13T09:00:53.239Z] :: Process pid: unknown
[task 2023-12-13T09:00:53.239Z] :: Operating system: Linux
[task 2023-12-13T09:00:53.239Z] ::                   5.4.0-1106-gcp #115~18.04.1-Ubuntu SMP Mon May 22 20:46:39 UTC 2023
[task 2023-12-13T09:00:53.239Z] :: CPU: amd64
[task 2023-12-13T09:00:53.239Z] ::      family 6 model 85 stepping 7
[task 2023-12-13T09:00:53.239Z] ::      16 CPUs
[task 2023-12-13T09:00:53.239Z] :: Linux Ubuntu 22.04 - jammy (Ubuntu 22.04.3 LTS)
[task 2023-12-13T09:00:53.239Z] ::
[task 2023-12-13T09:00:53.239Z] :: Crash reason:  SIGSEGV / SI_KERNEL
[task 2023-12-13T09:00:53.239Z] :: Crash address: 0xe5e5e5e5e5e5e6e5
[task 2023-12-13T09:00:53.239Z] :: Crashing instruction: `mov edx, dword [rcx + 0x100]`
[task 2023-12-13T09:00:53.239Z] :: Memory accessed by instruction:
[task 2023-12-13T09:00:53.239Z] ::   0. Address: 0xe5e5e5e5e5e5e6e5
[task 2023-12-13T09:00:53.239Z] ::      Size: 4
[task 2023-12-13T09:00:53.239Z] :: Process uptime: not available
[task 2023-12-13T09:00:53.239Z] ::
[task 2023-12-13T09:00:53.239Z] :: Thread 29 CanvasRenderer (crashed)
[task 2023-12-13T09:00:53.239Z] ::  0  libgallium_dri.so + 0x704fc0
[task 2023-12-13T09:00:53.239Z] ::      rax = 0x00007f0d5db5db80    rdx = 0x00000000e5e5e5e5
[task 2023-12-13T09:00:53.239Z] ::      rcx = 0xe5e5e5e5e5e5e5e5    rbx = 0x00007f0d5523fc20
[task 2023-12-13T09:00:53.239Z] ::      rsi = 0x00007f0d5523fc20    rdi = 0x00007f0d73dc3d08
[task 2023-12-13T09:00:53.239Z] ::      rbp = 0x00007f0d73d89000    rsp = 0x00007f0d753fdb68
[task 2023-12-13T09:00:53.239Z] ::       r8 = 0x0000000000000000     r9 = 0x0000000000000000
[task 2023-12-13T09:00:53.239Z] ::      r10 = 0x00007f0d753fdc70    r11 = 0x0000000000000000
[task 2023-12-13T09:00:53.239Z] ::      r12 = 0x0000000000000000    r13 = 0x0000000000000000
[task 2023-12-13T09:00:53.239Z] ::      r14 = 0x00007f0d73e36000    r15 = 0x00007f0da121f2e8
[task 2023-12-13T09:00:53.239Z] ::      rip = 0x00007f0d8a8d4fc0
[task 2023-12-13T09:00:53.240Z] ::     Found by: given as instruction pointer in context
[task 2023-12-13T09:00:53.240Z] ::  1  libgallium_dri.so + 0x707fe8
[task 2023-12-13T09:00:53.240Z] ::      rsp = 0x00007f0d753fdb70    rip = 0x00007f0d8a8d7fe9
[task 2023-12-13T09:00:53.240Z] ::     Found by: stack scanning
[task 2023-12-13T09:00:53.240Z] ::  2  libgallium_dri.so + 0x6f60dd
[task 2023-12-13T09:00:53.240Z] ::      rsp = 0x00007f0d753fdb90    rip = 0x00007f0d8a8c60de
[task 2023-12-13T09:00:53.240Z] ::     Found by: stack scanning
[task 2023-12-13T09:00:53.240Z] ::  3  libgallium_dri.so + 0x723f12
[task 2023-12-13T09:00:53.240Z] ::      rsp = 0x00007f0d753fdbf0    rip = 0x00007f0d8a8f3f13
[task 2023-12-13T09:00:53.240Z] ::     Found by: stack scanning
[task 2023-12-13T09:00:53.240Z] ::  4  libgallium_dri.so + 0x1553dc8
[task 2023-12-13T09:00:53.240Z] ::      rsp = 0x00007f0d753fdbf8    rip = 0x00007f0d8b723dc9
[task 2023-12-13T09:00:53.240Z] ::     Found by: stack scanning
[task 2023-12-13T09:00:53.240Z] ::  5  libgallium_dri.so + 0x723eb0
[task 2023-12-13T09:00:53.240Z] ::      rsp = 0x00007f0d753fdc00    rip = 0x00007f0d8a8f3eb1
[task 2023-12-13T09:00:53.240Z] ::     Found by: stack scanning
[task 2023-12-13T09:00:53.240Z] ::  6  libgallium_dri.so + 0x3c6b7a
[task 2023-12-13T09:00:53.240Z] ::      rsp = 0x00007f0d753fdc60    rip = 0x00007f0d8a596b7b
[task 2023-12-13T09:00:53.240Z] ::     Found by: stack scanning
[task 2023-12-13T09:00:53.240Z] ::  7  libgallium_dri.so + 0x3c6b45
[task 2023-12-13T09:00:53.240Z] ::      rsp = 0x00007f0d753fdc70    rip = 0x00007f0d8a596b46
[task 2023-12-13T09:00:53.240Z] ::     Found by: stack scanning

(unfortunately no debug info)
So the crash is happening in a system library, and per the crash address being our poisoining pattern, involves a use-after-free.

Flags: needinfo?(bandali)

There we go, a trace with symbols for system libraries:

[task 2023-12-13T23:05:44.813Z] :: PROCESS-CRASH | None [@ lp_scene_is_resource_referenced] | Profiling run
[task 2023-12-13T23:05:44.813Z] :: Crash dump filename: /tmp/tmpz2jhn2ik/minidumps/7adf2fbb-eee7-d29b-90fa-c6608f7c0c07.dmp
[task 2023-12-13T23:05:44.813Z] :: Process type: main
[task 2023-12-13T23:05:44.813Z] :: Process pid: unknown
[task 2023-12-13T23:05:44.813Z] :: Operating system: Linux
[task 2023-12-13T23:05:44.813Z] ::                   5.4.0-1106-gcp #115~18.04.1-Ubuntu SMP Mon May 22 20:46:39 UTC 2023
[task 2023-12-13T23:05:44.813Z] :: CPU: amd64
[task 2023-12-13T23:05:44.813Z] ::      family 6 model 85 stepping 7
[task 2023-12-13T23:05:44.813Z] ::      16 CPUs
[task 2023-12-13T23:05:44.813Z] :: Linux Ubuntu 22.04 - jammy (Ubuntu 22.04.3 LTS)
[task 2023-12-13T23:05:44.813Z] ::
[task 2023-12-13T23:05:44.813Z] :: Crash reason:  SIGSEGV / SI_KERNEL
[task 2023-12-13T23:05:44.813Z] :: Crash address: 0xe5e5e5e5e5e5e6e5
[task 2023-12-13T23:05:44.813Z] :: Crashing instruction: `mov edx, dword [rcx + 0x100]`
[task 2023-12-13T23:05:44.813Z] :: Memory accessed by instruction:
[task 2023-12-13T23:05:44.813Z] ::   0. Address: 0xe5e5e5e5e5e5e6e5
[task 2023-12-13T23:05:44.813Z] ::      Size: 4
[task 2023-12-13T23:05:44.813Z] :: Process uptime: not available
[task 2023-12-13T23:05:44.813Z] ::
[task 2023-12-13T23:05:44.813Z] :: Thread 29 CanvasRenderer (crashed)
[task 2023-12-13T23:05:44.813Z] ::  0  libgallium_dri.so!lp_scene_is_resource_referenced [lp_scene.c : 519 + 0x0]
[task 2023-12-13T23:05:44.813Z] ::      rax = 0x00007fd88dccea90    rdx = 0x00000000e5e5e5e5
[task 2023-12-13T23:05:44.813Z] ::      rcx = 0xe5e5e5e5e5e5e5e5    rbx = 0x00007fd88417bf90
[task 2023-12-13T23:05:44.813Z] ::      rsi = 0x00007fd88417bf90    rdi = 0x00007fd8a10c3d08
[task 2023-12-13T23:05:44.813Z] ::      rbp = 0x00007fd8a1089000    rsp = 0x00007fd8a26fdb68
[task 2023-12-13T23:05:44.813Z] ::       r8 = 0x0000000000000000     r9 = 0x0000000000000000
[task 2023-12-13T23:05:44.813Z] ::      r10 = 0x00007fd8a26fdc70    r11 = 0x0000000000000000
[task 2023-12-13T23:05:44.813Z] ::      r12 = 0x0000000000000000    r13 = 0x0000000000000000
[task 2023-12-13T23:05:44.813Z] ::      r14 = 0x00007fd8a1136000    r15 = 0x00007fd8ce21b6e8
[task 2023-12-13T23:05:44.813Z] ::      rip = 0x00007fd8b79b2fc0
[task 2023-12-13T23:05:44.813Z] ::     Found by: given as instruction pointer in context
[task 2023-12-13T23:05:44.813Z] ::  1  libgallium_dri.so!lp_setup_is_resource_referenced [lp_setup.c : 1130 + 0x7]
[task 2023-12-13T23:05:44.813Z] ::      rbx = 0x00007fd88417bf90    rbp = 0x00007fd8a1089000
[task 2023-12-13T23:05:44.813Z] ::      rsp = 0x00007fd8a26fdb70    r12 = 0x0000000000000000
[task 2023-12-13T23:05:44.813Z] ::      r13 = 0x0000000000000000    r14 = 0x00007fd8a1136000
[task 2023-12-13T23:05:44.813Z] ::      r15 = 0x00007fd8ce21b6e8    rip = 0x00007fd8b79b5fe9
[task 2023-12-13T23:05:44.813Z] ::     Found by: call frame info
[task 2023-12-13T23:05:44.813Z] ::  2  libgallium_dri.so!llvmpipe_flush_resource [lp_flush.c : 127 + 0xd]
[task 2023-12-13T23:05:44.813Z] ::      rbx = 0x0000000000000000    rbp = 0x00007fd8ce21b710
[task 2023-12-13T23:05:44.813Z] ::      rsp = 0x00007fd8a26fdb90    r12 = 0x00007fd88417bf90
[task 2023-12-13T23:05:44.813Z] ::      r13 = 0x0000000000000000    r14 = 0x00007fd8a1136000
[task 2023-12-13T23:05:44.813Z] ::      r15 = 0x00007fd8ce21b6e8    rip = 0x00007fd8b79a40de
[task 2023-12-13T23:05:44.813Z] ::     Found by: call frame info
[task 2023-12-13T23:05:44.813Z] ::  3  libgallium_dri.so!llvmpipe_set_sampler_views [lp_state_sampler.c : 152 + 0x25]
[task 2023-12-13T23:05:44.813Z] ::      rbx = 0x0000000000000004    rbp = 0x0000000000000000
[task 2023-12-13T23:05:44.813Z] ::      rsp = 0x00007fd8a26fdbf0    r12 = 0x00007fd87a2b7040
[task 2023-12-13T23:05:44.813Z] ::      r13 = 0x00007fd8a94e1000    r14 = 0x00007fd8a26fdc70
[task 2023-12-13T23:05:44.813Z] ::      r15 = 0x0000000000000000    rip = 0x00007fd8b79d1f13
[task 2023-12-13T23:05:44.813Z] ::     Found by: call frame info
[task 2023-12-13T23:05:44.813Z] ::  4  libgallium_dri.so!update_textures [st_atom_texture.c : 272 + 0xe]
[task 2023-12-13T23:05:44.813Z] ::      rbx = 0x00007fd87b1db000    rbp = 0x0000000000000001
[task 2023-12-13T23:05:44.813Z] ::      rsp = 0x00007fd8a26fdc60    r12 = 0x0000000000000004
[task 2023-12-13T23:05:44.813Z] ::      r13 = 0x0000000000000284    r14 = 0x00007fd8a94e1000
[task 2023-12-13T23:05:44.813Z] ::      r15 = 0x00007fd8a26fdc70    rip = 0x00007fd8b7674b7b
[task 2023-12-13T23:05:44.813Z] ::     Found by: call frame info
[task 2023-12-13T23:05:44.813Z] ::  5  libgallium_dri.so!st_validate_state [st_util.h : 128]
[task 2023-12-13T23:05:44.813Z] ::     Found by: inlining
[task 2023-12-13T23:05:44.813Z] ::  6  libgallium_dri.so!prepare_draw [st_draw.c : 88 + 0x54]
[task 2023-12-13T23:05:44.813Z] ::      rbx = 0x0080001128080000    rbp = 0x0080001128080800
[task 2023-12-13T23:05:44.813Z] ::      rsp = 0x00007fd8a26fddc0    r12 = 0x0000000000000001
[task 2023-12-13T23:05:44.813Z] ::      r13 = 0x00007fd8b8f6f760    r14 = 0x00007fd87b1db000
[task 2023-12-13T23:05:44.813Z] ::      r15 = 0x0000000000000800    rip = 0x00007fd8b74164b6
[task 2023-12-13T23:05:44.813Z] ::     Found by: call frame info
[task 2023-12-13T23:05:44.813Z] ::  7  libgallium_dri.so!st_draw_gallium [st_draw.c : 141 + 0x7]
[task 2023-12-13T23:05:44.813Z] ::      rbx = 0x00007fd8a26fde44    rbp = 0x00007fd8a26fde50
[task 2023-12-13T23:05:44.813Z] ::      rsp = 0x00007fd8a26fde00    r12 = 0x00007fd87ad1b000
[task 2023-12-13T23:05:44.813Z] ::      r13 = 0x0000000000000001    r14 = 0x0000000000000000
[task 2023-12-13T23:05:44.813Z] ::      r15 = 0x00007fd87b1db000    rip = 0x00007fd8b741682a
[task 2023-12-13T23:05:44.813Z] ::     Found by: call frame info
[task 2023-12-13T23:05:44.813Z] ::  8  libgallium_dri.so!_mesa_draw_arrays [draw.c : 1202 + 0x5]
[task 2023-12-13T23:05:44.813Z] ::      rbx = 0x00007fd88b7caf00    rbp = 0x00007fd8a26fdec0
[task 2023-12-13T23:05:44.813Z] ::      rsp = 0x00007fd8a26fde40    r12 = 0x000000000000004e
[task 2023-12-13T23:05:44.813Z] ::      r13 = 0x0000000000001e73    r14 = 0x00007fd8b928f620
[task 2023-12-13T23:05:44.813Z] ::      r15 = 0x0000000000000001    rip = 0x00007fd8b75a1b55
[task 2023-12-13T23:05:44.813Z] ::     Found by: call frame info
[task 2023-12-13T23:05:44.813Z] ::  9  libxul.so + 0x63c3d1d
[task 2023-12-13T23:05:44.813Z] ::      rbx = 0x00007fd88b7caf00    rbp = 0x00007fd8a26fdec0
[task 2023-12-13T23:05:44.813Z] ::      rsp = 0x00007fd8a26fde90    r12 = 0x000000000000004e
[task 2023-12-13T23:05:44.813Z] ::      r13 = 0x0000000000001e73    r14 = 0x00007fd8b928f620
[task 2023-12-13T23:05:44.813Z] ::      r15 = 0x0000000000000001    rip = 0x00007fd8bfcd5d1e
[task 2023-12-13T23:05:44.813Z] ::     Found by: call frame info

https://docs.mesa3d.org/relnotes/23.1.6.html notably mentions a UAF fix in lp_scene_is_resource_referenced. The version in use is 23.0.4. jammy-proposed apparently has 23.2.1.

Upgrading mesa drivers to the version in jammy-proposed "works", but only kind of accidentally: libGL error: DRI driver not from this Mesa build ('23.2.1-1ubuntu3.1~22.04.1' vs '23.0.4-0ubuntu1~22.04.1') (presumably because the mesa in the snap environment is still 23.0.4)

Blocks: snap

I was told a few weeks ago (mentionned it on https://bugzilla.mozilla.org/show_bug.cgi?id=1859291#c30) there would be an update of Mesa on the 22.04 base, which is likely the one you tested in proposed. Once it lands, an update of the GNOME snap should pull newer versions. I'm worried this is going to take enough time for the PGO breakage to start leaking over beta and maybe release.

Duplicate of this bug: 1870210

Giving a status update here. We didn't manage to workaround the issue easily and due to holidays the previously mentioned SRU isn't going to move to -updates/land in the gnome-sdk snap before the end of year so we have disabled PGO for nightly and now for beta. The plan is to try to get the fixed version in the gnome sdk after the holidays and then re-enable PGO in time for 122.

I've an extra question to add there. The issue impacts the snap but if it's due to a bug in the mesa version currently in jammy it should impact also a source or deb build the same way if PGO is turned on right? And also does it mean we are catching a bug at buildtime which 22.04 users will likely also hit at runtime (as well as users on other distributions based on the same mesa version)? If that's the case maybe it would still be worth workarounding the problem someone on the firefox side because you can't really rely on the fact that firefox users will have a base system including the fix...

$ ll firefox_123.0a1_amd64.*
-rw-r--r-- 1 alexandre alexandre 140M 10 janv. 10:22 firefox_123.0a1_amd64.debug
-rw-r--r-- 1 alexandre alexandre 281M 10 janv. 10:22 firefox_123.0a1_amd64.snap
Assignee: nobody → lissyx+mozillians
Flags: needinfo?(bandali)
Blocks: 1873861

Both PR were merged, PGO is back on those. Builds on launchpad are green as well.

Status: NEW → RESOLVED
Closed: 9 months ago
Resolution: --- → FIXED
Component: General → Third Party Packaging

As a followup update, with newer mesa 23.2.1-1ubuntu3.1~22.04.2 having made its way into the -updates pocket of jammy (Ubuntu 22.04 LTS) and subsequently into the gnome-sdk snap, it is now available to the Firefox snap (and others). As such, Alexandre's earlier workaround of forcing software WebRender to remedy PGO build failures is no longer necessary, and I've dropped it from nightly, beta, and stable 126.0 candidate.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: