Open Bug 1987505 Opened 6 months ago Updated 5 months ago

Solve why GTest and cppunittests will sometimes go into a perma-fail state

Categories

(GeckoView :: General, task, P3)

All
Android
task

Tracking

(Not tracked)

REOPENED

People

(Reporter: olivia, Assigned: jmaher)

References

Details

(Whiteboard: [fxdroid] [geckoview][gv-grab-bag])

Attachments

(1 file)

Sometimes Geckoview GTest and cppunittests will go into a semi-permafail state. It is usually unclear from "Similar Jobs" that it not related to the patch, because often the similar job history will be green.

It seems to be a possible infrastructure issue. This error makes it hard for developers to interpret what is going on and if their test run is green because it seems like a new failure. It is also a suite on ./mach try --preset android-geckoview

An intermediary step might be seeing if one of the linked bugs that is attached to the failure should be renamed or added, so it at least shows up in categorization more transparently.

This has happened several times recently:

Looking at the GTest-1proc, that's an infra issue -- Python packaging timeouts. That should probably go purple and retry.

Looking at the cppunit-1proc, that's because those tests make no sense against artifact builds. The test executables literally aren't packaged. I don't know if it's possible to strip them from the run, but they're not failures per se.

jmaher: can we stop running some test configurations against artifact builds, since they don't make sense?

Flags: needinfo?(jmaher)

gtest, cppunit, jittest- all require full builds (or someone to modify artifact builds to bundle compiled bits).

we have a transform for this:
https://searchfox.org/firefox-main/source/taskcluster/gecko_taskgraph/target_tasks.py#303

it seems valid still. That transform appears to only be referenced for ./mach try auto:
https://searchfox.org/firefox-main/source/taskcluster/gecko_taskgraph/filter_tasks.py#47

if that is the case, ideally we can filter this out more broadly.

Flags: needinfo?(jmaher)

oddly enough (maybe this is a .mach try preset thing, when I use --artifact, it never gets to the filter from comment 2. When I set my mozconfig via ./mach bootstrap to use artifact, it will use the filter from comment 2, but artifact build is not set and we exit early.

more exploration is needed.

bug 1695325 tried to implement this exact thing- that was many years ago. I suspect how we define artifact build has changed. :ahal, could you take a quick look at this bug and see if there is something obvious you see?

Flags: needinfo?(ahal)
Regressions: 1695325

so ./mach try auto supports filtering out non artifact builds:
https://searchfox.org/firefox-main/source/taskcluster/gecko_taskgraph/filter_tasks.py#47

but right below that the filter for fuzzy/etc. doesn't seem to support it:
https://searchfox.org/firefox-main/source/taskcluster/gecko_taskgraph/filter_tasks.py#54

trying to add it there I find that the input parameters ={}, which results in the fact that we never find the use-artifact-build flag:
https://searchfox.org/firefox-main/source/taskcluster/gecko_taskgraph/target_tasks.py#303

I will wait for :ahal to chime in here.

Component: Extensions → General

I think the reason we don't support it for fuzzy is that it would be even more confusing if a user explicitly requested a task and then it didn't even show up at all. At least this way the task is orange and someone can tell them these tasks don't work with artifact builds.

I think what we want instead is for ./mach try fuzzy to detect when an artifact build is selected, and then not even offer these tasks in the selection window in the first place.

Flags: needinfo?(ahal)

the bug is originally written for ./mach try fuzzy --preset ... where things like gtest are specified. I guess it will be better to remove things at ./mach try ... instead of the decision task.

Assignee: nobody → jmaher
Status: NEW → ASSIGNED
Pushed by jmaher@mozilla.com: https://github.com/mozilla-firefox/firefox/commit/b7180d4f836a https://hg.mozilla.org/integration/autoland/rev/6aeda8c3e0c5 Remove tasks from try selection list if artifact build and task doesn't support artifact. r=taskgraph-reviewers,hneiva
Status: ASSIGNED → RESOLVED
Closed: 6 months ago
Resolution: --- → FIXED
Target Milestone: --- → 145 Branch
Pushed by ctuns@mozilla.com: https://github.com/mozilla-firefox/firefox/commit/13fc01acfaaa https://hg.mozilla.org/integration/autoland/rev/03ecd94cfd27 Revert "Bug 1987505 - Remove tasks from try selection list if artifact build and task doesn't support artifact. r=taskgraph-reviewers,hneiva" as requested by jmaher on element.

Backed out as requested by jmaher on element.

Flags: needinfo?(jmaher)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: 145 Branch → ---

here is a command that breaks with the above changes:
mach try fuzzy -q "web-platform-tests \!macosx \!shippable \!asan \!tsan" --rebuild 5 --artifact testing/web-platform/tests/html/semantics/interestfor/interestfor-css-properties.tentative.htm

:ahal, how can I make taskgraph generation take as input try_task_config data?

Flags: needinfo?(jmaher) → needinfo?(ahal)
Severity: -- → N/A
Priority: -- → P3

In CI, the project needs to contain the string "try" and it needs to be an "hg-push":
https://searchfox.org/firefox-main/source/taskcluster/gecko_taskgraph/decision.py#412

But if you want to just want to test things locally, edit:
https://searchfox.org/firefox-main/source/taskcluster/test/params/try-config.yml

and point to it with -p

Flags: needinfo?(ahal)

I meant in a transform so we didn't build task graph nodes, or a better way to do what I did that wouldn't screw up the taskgraph?

Flags: needinfo?(ahal)

I added some comments in the phab revision.

Flags: needinfo?(ahal)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: