sometimes all chunks for a test suite with a specific config on a platform scheduled, should use fewer chunks instead (= less test suite overhead like task setup)
Categories
(Firefox Build System :: Task Configuration, task)
Tracking
(firefox121 affected)
Tracking | Status | |
---|---|---|
firefox121 | --- | affected |
People
(Reporter: aryx, Unassigned)
Details
Almost every push on autoland gets all web platform tasks for Android 7.0 x86-64 scheduled.
An exception with only few tasks is this one.
We scheduled every task if a task is new (its name label got added/changed) but this does apply here.
The task count per push indicates this started on 2023-10-19 but not immediately - there are still a few pushes which got optimized - tasks deemed unnecessary got removed.
The log of a decision task for a push which scheduled all these tasks contains lines like this one:
optimize: test-android-em-7.0-x86_64-lite-qr/opt-geckoview-web-platform-tests-nofis-1 kept because of test (skip-unless-backstop-and-not-skip-unless-backstop-and-skip-unless-push-interval-10.0-and-skip-unless-schedules-or-bugbug-reduced-manifests-fallback-last-10-pushes-or-platform-disperse-or-skip-unless-backstop-and-skip-unless-push-interval-10.0-and-skip-unless-schedules-or-bugbug-reduced-manifests-fallback-low-or-platform-disperse)
Marco, can you investigate what is happening here? The tasks sometimes get backlogged because the pool is insufficient for such a load.
Comment 1•2 years ago
|
||
I think we need a new deployment of bugbug. Unfortunately automatic deployments have not been working lately because the task that is supposed to perform them is too slow at pushing images to Heroku and fails.
Comment 2•2 years ago
|
||
Here is an example: https://community-tc.services.mozilla.com/tasks/E8hxNhZ8RJqj0Ze8Ru47xQ.
Comment 3•2 years ago
|
||
I did a manual deployment, we should be good now. Let me know if this still happens.
![]() |
Reporter | |
Comment 4•2 years ago
|
||
The issue persists, almost always all web-platform tasks get scheduled for Android.
Marco, you have another look?
Comment 5•2 years ago
|
||
I had a look at 6e5782df6da147ae5c916097b12efcfae7149b5c.
Looking at the bugbug-push-schedules.json artifact, I see bugbug did not select any Android web-platform-tests (none of them are in "reduced_tasks" nor "tasks").
I also noticed the Android web-platform-tasks are not listed in "known_tasks". This means the scheduler will schedule them as it thinks they are new tasks: https://searchfox.org/mozilla-central/rev/11d085b63cf74b35737d9c036be80434883dd3f6/taskcluster/gecko_taskgraph/optimize/bugbug.py#161.
Bugbug builds the "known_tasks" list by using the target-tasks.json file from the decision task. It looks like the target-tasks.json file does not contain the Android web-platform-tasks. Ahal, do you know why they are missing?
Comment 6•2 years ago
|
||
I see them in there. It's the nofis
variant that's scheduled to run. E.g search for test-android-em-7.0-x86_64-qr/debug-geckoview-web-platform-tests-nofis
in this recent target-tasks.json artifact.
If they weren't in target-tasks.json
, then they wouldn't run in the first place (e.g, they wouldn't even make it to the optimization stage). Unless something depends on them...
Comment 7•2 years ago
|
||
Also searching for test-android-em-7.0-x86_64-qr/debug-geckoview-web-platform-tests-nofis
in the bugbug-push-schedules.json artifact from the same push, does show that it is in known_tasks
. But not in selected tasks, so it must be getting picked for some other reason.
Comment 8•2 years ago
|
||
Right, CTRL+F in Firefox doesn't work when you view JSON in the pretty JSON viewer...
I see they are present in target-tasks.json and in bugbug-push-schedules even for the push I had looked at earlier.
Like you said, they are not selected by bugbug, so in theory they should not run.
Comment 9•2 years ago
•
|
||
Actually, I spent a lot of time looking into this, and I think because we're using the bugbug-reduced-manifest
strategy, the tasks
and reduced_tasks
values are unused. Instead you have to look at the group
value. I notice there's quite a lot of high confidence WPT groups in there.. could it be that every chunk just happens to have one?
But that begs the question.. Why is it only Android that is getting these manifests? IIRC, the platform-disperse
strategy is supposed to spread the manifest out across platforms.. maybe that's where the real bug lies?
Marco, any thoughts?
Updated•2 years ago
|
![]() |
Reporter | |
Comment 10•1 years ago
|
||
There are actually fewer manifests than for pushes which run everything, but the chunk count is the same - this explains ahal's assumptions.
The same behavior has been noticed for Linux debug browser-chrome: both M-swr and M-spi-nw have all chunks.
Comment 11•1 years ago
|
||
Sorry I was away last week. Ahal, with chunking in the taskgraph it shouldn't be a problem, right? What you are saying should only happen without chunking in the taskgraph, where the chunks were fixed and we were selecting a chunk if it had at least a selected manifest. Am I missing something?
Comment 12•1 years ago
|
||
Yeah, that's right. I checked and we should be setting chunks dynamically. So I'm confused about what's going on here :/
Description
•