Open Bug 1551889 Opened 5 years ago Updated 2 years ago

Android Test Verification mozinfo randomly selects host mozinfo instead of android about 1% of the time.

Categories

(Testing :: General, defect, P2)

defect

Tracking

(Not tracked)

People

(Reporter: bc, Unassigned)

References

(Blocks 1 open bug)

Details

In bug 1522113 comment 6, gbrown noticed that the mozinfo was incorrect for one of the verification tests in bug 1522113 comment 4.

I added some debug logging and ran the android verification again against the tip of inbound with out other changes.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=b09d0ce45fa6113645d6bb0c5494271ea090d091

Two of the jobs reproduced:

https://tools.taskcluster.net/groups/bgsqHRIGR7GvebDp8mVpwA/tasks/Fm7RUtm2QeSgP-ErfY2QBQ/runs/0/logs/public%2Flogs%2Flive.log

https://tools.taskcluster.net/groups/bgsqHRIGR7GvebDp8mVpwA/tasks/c3OXtzajTm-jQAzTEj8h_g/runs/0/logs/public%2Flogs%2Flive.log

It appears that in both of these cases, it picks up /builds/worker/checkouts/gecko/obj-x86_64-pc-linux-gnu/mozinfo.json instead of /builds/worker/workspace/build/tests/mozinfo.json but I don't know why.

gbrown: Any idea?

Flags: needinfo?(gbrown)

ahal: Any idea why we are sometimes picking up the host mozinfo instead of the android mozinfo from the test?

Flags: needinfo?(ahal)

The worker nodes that show this problem are:

i-000acc29aec9605c3
i-007879985b614d112
i-00e36dbbd58f86504
i-0236a06099531601c
i-02373c65cd6cfb337
i-0340c378c9eb661b1
i-03bc2103ff604084f
i-0468f1b851d5b47f3
i-04afcca273719f362
i-04c4928b83d8689f6
i-0616d67d30effe347
i-06b45257b5bf779e9
i-07f333ffa69713562
i-080892e05e6ce49aa
i-0ce224394c7b5b8e0
i-0d201020a0b53edf9
i-0f974273e801c9842
i-0f9b75c7fc2b6d7b1

These are distinct from the workers that get the correct mozinfo.

coop: Do you know who could help me figure out if there is something hinky with these workers?

Flags: needinfo?(coop)

/builds/worker/checkouts is a cache location for all android test tasks:

 using cache "gecko-level-3-checkouts-v3-694222febc6321e83215" -> /builds/worker/checkouts

but android test tasks normally (never?) populate it, so I suppose it makes sense that we sometimes find content there unrelated to our task.

I am surprised to see mozinfo finding that location though: Somehow from_environment() is finding that location as topobjdir. Is that because we are running python from checkouts (bug 1195299)?

Flags: needinfo?(gbrown)

I bet you are right. I was trying to do more debugging in MozbuildObject and ran into a problem I think illustrates that the source checkout is stale.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=b5369e762c189730ea239c78407cc10ccef554ef

File "/builds/worker/workspace/mozharness/mozinfo/mozinfo.py", line 265, in find_and_update_from_json
build = MozbuildObject.from_environment(debug=debug)
TypeError: from_environment() got an unexpected keyword argument 'debug'

but https://hg.mozilla.org/try/file/b5369e762c189730ea239c78407cc10ccef554ef/python/mozbuild/mozbuild/base.py#l112
shows

def from_environment(cls, cwd=None, detect_virtualenv_mozinfo=True, debug=False):

It seems that the import https://hg.mozilla.org/try/file/b5369e762c189730ea239c78407cc10ccef554ef/testing/mozbase/mozinfo/mozinfo/mozinfo.py#l263 is getting the mozbuild.base from the builds/worker/checkouts/gecko where the mach command lives and not from /builds/worker/workspace/?

I'm going to try to do a try run with https://hg.mozilla.org/integration/mozilla-inbound/rev/93075ec49df3982c26873b822d762bd3d8863fad where the run mach from checkout was backed out on inbound.

I'm not sure if my input is still useful here. Note that there are other issues with c3.xlarge-based instances over in bug 1545820.

Flags: needinfo?(coop)

I have seen this on m3 and c3 xlarge as well.

I did 40 runs for TV android 4.3 pgo and debug at https://treeherder.mozilla.org/#/jobs?repo=try&revision=9466b73e2b7d0f2f4f17407334f41fde0bf9fa47 and did not reproduce this issue once.

I'll confirm that it is bug 1195299 by going back to the prior commit and testing that.

I wonder if I can work around this by ignoring the mozinfo.json in the source checkout somehow.

Does anyone have any pointers on how to run this locally? This is a pain to do solely via try.

I did the prior commit at https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&revision=7a243d2c9dab240ea7028abc01fe7b89857a4fab but did not reproduce the issue. I will try to find the cause.

Priority: -- → P2

After bisecting I found that the true rate is more like 1%. Using --rebuild 100 didn't work with such a low reproducibility rate and pointed to a completely unrelated bug.

Summary: Android Test Verification mozinfo randomly selects host mozinfo instead of android about 5% of the time. → Android Test Verification mozinfo randomly selects host mozinfo instead of android about 1% of the time.

Sorry for late reply. Seems like you are on the right track, there is a lot of complicated logic in MozbuildObject.from_environment() to find a mozinfo.json, which depends on things like what the $CWD is and the active virtualenv (if any). If we could compare CWD/virtualenv in a task with the job passing and failing, it might shed some light.

Flags: needinfo?(ahal)
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.