Open Bug 1870982 Opened 2 years ago Updated 2 years ago

Intermittent Android 7.0 failures Unsuccessful task run with exit code: 1 - python version is older than required

Categories

(Testing :: General, defect, P2)

defect

Tracking

(Not tracked)

People

(Reporter: nataliaCs, Unassigned)

Details

[task 2023-12-20T03:07:05.688Z] Running: python3 /builds/worker/checkouts/gecko/mach python /builds/worker/workspace/mozharness/scripts/web_platform_tests.py  --config-file /builds/worker/workspace/mozharness/configs/android/android-x86_64.py --config-file /builds/worker/workspace/mozharness/configs/web_platform_tests/prod_config_android.py  --test-type=testharness --skip-implementation-status=backlog --skip-implementation-status=not-implementing --skip-timeout --skip-crash --exclude-tag=webgpu --exclude-tag=canvas --disable-fission --setpref=media.peerconnection.mtransport_process=false --setpref=network.process.enabled=false --setpref=layers.d3d11.enable-blacklist=false --download-symbols=ondemand 
[task 2023-12-20T03:07:05.715Z] Python 3.8+ is required to run mach.
[task 2023-12-20T03:07:05.715Z] You are running Mach with Python 3.7.5
[task 2023-12-20T03:07:05.715Z] See https://firefox-source-docs.mozilla.org/setup/linux_build.html#installingpython
[task 2023-12-20T03:07:05.715Z] for guidance on how to install Python on your system.
[task 2023-12-20T03:07:05.718Z] cleanup
[task 2023-12-20T03:07:05.718Z] + cleanup
[task 2023-12-20T03:07:05.718Z] + local rv=1
[task 2023-12-20T03:07:05.718Z] + [[ -s /builds/worker/.xsession-errors ]]
[task 2023-12-20T03:07:05.718Z] + cp /builds/worker/.xsession-errors /builds/worker/artifacts/public/xsession-errors.log
[task 2023-12-20T03:07:05.720Z] + '[' ']'
[task 2023-12-20T03:07:05.720Z] + true
[task 2023-12-20T03:07:05.720Z] + cleanup_xvfb
[task 2023-12-20T03:07:05.720Z] ++ pidof Xvfb
[task 2023-12-20T03:07:05.722Z] + local xvfb_pid=56
[task 2023-12-20T03:07:05.722Z] + local vnc=false
[task 2023-12-20T03:07:05.722Z] + local interactive=false
[task 2023-12-20T03:07:05.722Z] + '[' -n 56 ']'
[task 2023-12-20T03:07:05.722Z] + [[ false == false ]]
[task 2023-12-20T03:07:05.722Z] + [[ false == false ]]
[task 2023-12-20T03:07:05.722Z] + kill 56
[task 2023-12-20T03:07:05.722Z] + screen -XS xvfb quit
[task 2023-12-20T03:07:05.724Z] + exit 1
[taskcluster 2023-12-20 03:07:08.370Z] === Task Finished ===
[taskcluster 2023-12-20 03:07:09.022Z] Unsuccessful task run with exit code: 1 completed in 100.902 seconds

It looks like the test tasks are using whatever checkout they might have around on the worker, which doesn't guarantee at all a tree that matches the current push. In this case, the worker probably had something from central or beta around, which requires python 3.8, while the push was from mozilla-release, and the docker images from mozilla-release don't have python 3.8, the ones from beta/central do.

Flags: needinfo?(jmaher)
Flags: needinfo?(ahal)
Product: Firefox Build System → Testing

bug 1843209 migrated to mozilla beta on Dec 18th, this was merge day and here we required python 3.8, previously mozilla-beta was running python 3.7.5. When it merged, the same task on beta used the ubuntu1804-test image (built from autoland 8 days earlier https://firefox-ci-tc.services.mozilla.com/tasks/X4ORcEmsSJaZx_7oyLhCIw ) Today the same docker image from autoland is still used on mozilla-beta for the wpt android lite tasks.

what is odd is retriggers are green on mozilla-release and other tasks. both the failing and the passing retriggers use the same docker image built from beta on december 7th (https://firefox-ci-tc.services.mozilla.com/tasks/cJCZU90jTsWHsJm1_2O-IA)

Why do we have different docker image bases? This seems like a task graph dependency issue?

looking at the above try push, you can see:

  - P-IVao4hT6S_hV6ZFD5bgg
  - TXQtv_ViRueEF565Ql5lOw
  - a2FlXVSOQou0DeVAYyepPA
  - cJCZU90jTsWHsJm1_2O-IA
  - dcCAhtrQSNGfy68R7l0Dhw
  - f2fTnEdAQ8yJOMMYn6dnPg
  - fN6H5bi-QdWoyc8kIRrmwA

the task runs ./mach:

[task 2023-12-20T03:07:05.686Z] + /builds/worker/bin/run-mozharness
[task 2023-12-20T03:07:05.688Z] Running: python3 /builds/worker/checkouts/gecko/mach python /builds/worker/workspace/mozharness/scripts/web_platform_tests.py  --config-file /builds/worker/workspace/mozharness/configs/android/android-x86_64.py --config-file /builds/worker/workspace/mozharness/configs/web_platform_tests/prod_config_android.py  --test-type=testharness --skip-implementation-status=backlog --skip-implementation-status=not-implementing --skip-timeout --skip-crash --exclude-tag=webgpu --exclude-tag=canvas --disable-fission --setpref=media.peerconnection.mtransport_process=false --setpref=network.process.enabled=false --setpref=layers.d3d11.enable-blacklist=false --download-symbols=ondemand 
[task 2023-12-20T03:07:05.715Z] Python 3.8+ is required to run mach.
[task 2023-12-20T03:07:05.715Z] You are running Mach with Python 3.7.5
[task 2023-12-20T03:07:05.715Z] See https://firefox-source-docs.mozilla.org/setup/linux_build.html#installingpython
[task 2023-12-20T03:07:05.715Z] for guidance on how to install Python on your system.
[task 2023-12-20T03:07:05.718Z] cleanup
  - OnR89MlTSlShCrdf2F_VSA <- new in retrigger, task label: Action: Retrigger
  - P-IVao4hT6S_hV6ZFD5bgg
  - TXQtv_ViRueEF565Ql5lOw
  - a2FlXVSOQou0DeVAYyepPA
  - cJCZU90jTsWHsJm1_2O-IA
  - dcCAhtrQSNGfy68R7l0Dhw
  - f2fTnEdAQ8yJOMMYn6dnPg
  - fN6H5bi-QdWoyc8kIRrmwA
  - XnAx3iZBTdGbje-69txv6Q <- new in retrigger, task label: Gecko Decision Task

the task doesn't run mach, but python harness:

[task 2023-12-20T03:23:24.691Z] + /builds/worker/bin/run-mozharness
[task 2023-12-20T03:23:24.693Z] Running: python3 /builds/worker/workspace/mozharness/scripts/web_platform_tests.py  --config-file /builds/worker/workspace/mozharness/configs/android/android-x86_64.py --config-file /builds/worker/workspace/mozharness/configs/web_platform_tests/prod_config_android.py  --test-type=testharness --skip-implementation-status=backlog --skip-implementation-status=not-implementing --skip-timeout --skip-crash --exclude-tag=webgpu --exclude-tag=canvas --disable-fission --setpref=media.peerconnection.mtransport_process=false --setpref=network.process.enabled=false --setpref=layers.d3d11.enable-blacklist=false --download-symbols=ondemand 
[task 2023-12-20T03:23:24.910Z] 03:23:24     INFO - ConsoleLogger online at 20231220 03:23:24Z in /builds/worker/workspace
[task 2023-12-20T03:23:24.911Z] 03:23:24     INFO - Run as /builds/worker/workspace/mozharness/scripts/web_platform_tests.py --config-file /builds/worker/workspace/mozharness/configs/android/android-x86_64.py --config-file /builds/worker/workspace/mozharness/configs/web_platform_tests/prod_config_android.py --test-type=testharness --skip-implementation-status=backlog --skip-implementation-status=not-implementing --skip-timeout --skip-crash --exclude-tag=webgpu --exclude-tag=canvas --disable-fission --setpref=media.peerconnection.mtransport_process=false --setpref=network.process.enabled=false --setpref=layers.d3d11.enable-blacklist=false --download-symbols=ondemand
[task 2023-12-20T03:23:24.919Z] 03:23:24     INFO - Dumping config to /builds/worker/workspace/logs/localconfig.json.
[task 2023-12-20T03:23:24.921Z] 03:23:24     INFO - {'allow_software_gl_layers': False,
[task 2023-12-20T03:23:24.921Z] 03:23:24     INFO -  'android_version': 24,
[task 2023-12-20T03:23:24.921Z] 03:23:24     INFO -  'append_to_log': False,
[task 2023-12-20T03:23:24.921Z] 03:23:24     INFO -  'backlog': False,

I have no idea why web-platform-tests in this one case would be attempting to run mach instead of the python harness.

Flags: needinfo?(jmaher)

I have no idea why web-platform-tests in this one case would be attempting to run mach instead of the python harness.

Because of this:
https://searchfox.org/mozilla-central/rev/b580e3f77470b2337bc8ae032b58a85c11e66aba/taskcluster/scripts/tester/test-linux.sh#259

What is GECKO_PATH? Per the log:

[setup 2023-12-20T03:06:47.038Z] GECKO_PATH is /builds/worker/checkouts/gecko

And that directory comes from a cache:

[taskcluster 2023-12-20 03:05:28.120Z] using cache "gecko-level-3-checkouts-hg58-v3-c52fd4fedd061ea8b8e3" -> /builds/worker/checkouts

Nothing else in the log shows an explicit checkout happening. Unfortunately, the worker is not available anymore, so it's not possible to look at what specific previous task may have had a checkout, but that would be the only reason for the checkout to be there.

Flags: needinfo?(jmaher)

I am not sure of the best solution here. It sounds like we won't see this unless it happens on mozilla-release again, or in the future when there is a conflict of available python versions between python and mach.

we could put attributes in the tasks to indicate TASK_USE_CHECKOUT, then set that as an environment variable such that test-linux.sh could only use mach if that environment variable is true.

:glandium, does this sound reasonable, or are you looking for solving something else here?

Flags: needinfo?(jmaher)

Because of this:
https://searchfox.org/mozilla-central/rev/b580e3f77470b2337bc8ae032b58a85c11e66aba/taskcluster/scripts/tester/test-linux.sh#259

Yeah, so this seems like the bug here. It's assuming that if a checkout exists, then the task is meant to run from that checkout, which clearly isn't the case due to the cache.

Joel's approach sounds reasonable to me and is probably simplest. Maybe a more proper solution would be to remove that line and push the logic that determines which Python to use into the task definitions, but that will be a bit annoying to do as it's only the ubuntu1804-test image that's using this script atm, so you'd need to add Taskgraph logic that tightly couples to this image somewhere.

Flags: needinfo?(ahal)

Maybe another angle here, if the task doesn't need a checkout, why does it mount a checkout cache volume in the first place? I'm guessing the run_task transforms just add it regardless.. IIRC tasks can opt out, but they need to explicitly pass use-caches: false or something like that. Maybe we can be a bit smarter about automatically determining whether checkout caches are actually needed or not.

This is where the checkout setup comes from:
https://searchfox.org/mozilla-central/rev/a9cb718ef1502bc0fe5088476748e334fad9d6a1/taskcluster/gecko_taskgraph/transforms/job/mozharness_test.py#188-190

And this comes from bug 1304484.

We'd want to split the setting of the environment variables off support_vcs_checkout.

The severity field is not set for this bug.
:jmaher, could you have a look please?

For more information, please visit BugBot documentation.

Flags: needinfo?(jmaher)
Severity: -- → S4
Flags: needinfo?(jmaher)
Priority: -- → P2
You need to log in before you can comment on or make changes to this bug.