Closed Bug 1621339 Opened 4 years ago Closed 4 years ago

Raptor tests still appear to hang when downloading conditioned profiles

Categories

(Testing :: Raptor, defect, P1)

Version 3
defect

Tracking

(firefox75 fixed, firefox76 fixed)

RESOLVED FIXED
mozilla76
Tracking Status
firefox75 --- fixed
firefox76 --- fixed

People

(Reporter: gbrown, Assigned: tarek)

References

Details

(Keywords: hang)

Attachments

(1 file)

I noticed the condprof download code checks the TASK_CLUSTER variable, using a progress bar when not running on TASK_CLUSTER. But it looks like TASK_CLUSTER is False on taskcluster:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=75005366efba0e35d7930cdfe978c0d7729a5653

So, I guess the progress bar is being used, unexpectedly, which might be "bad".

(In reply to Geoff Brown [:gbrown] from comment #1)

I noticed the condprof download code checks the TASK_CLUSTER variable, using a progress bar when not running on TASK_CLUSTER. But it looks like TASK_CLUSTER is False on taskcluster:

That's interesting, but correcting for it does not eliminate the hangs:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=543f31a5a2efe86548505ad1f4e8f9ad02835266

Tarek, can you please check that?

Flags: needinfo?(tarek)

the variable was used here, but I guess it raptor it may not always be True ? (or that other example is buggy too)

https://searchfox.org/mozilla-central/source/testing/mozharness/mozharness/base/script.py#1781

as for the hanging thing, this is what I propose we do at this point: I am going to add some code that dumps the Python frames so we know exactly where it's blocked. Maybe that will help us understand what happens.

Assignee: nobody → tarek
Flags: needinfo?(tarek)
Priority: P2 → P1

I got some hangs (ssl handshake when getting TC secrets) . more logs and a timeout on the TC call:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=7d8790cb668b6ba46db821c43e1ccd9744ce206e

I found the issue. This is triggered by trying to get a task cluster secret to obfuscate the logs -- a feature I added in the part that makes the conditioned profile and manipulates a firefox account password.

Trying to call the TC proxy when doing the loggin fails and the error is hidden, and after 100+ requests things hang.

Building the patch now to solve this.

This fixes the hang we get because we have 100+ calls on
trying to get a TC secret we don't even need (and can't get).

try run of the condprof builder to make sure that works fine, and then we can land this fix

https://treeherder.mozilla.org/#/jobs?repo=try&revision=d571abc8b89e1335d8404b733fb07de923114eca

Pushed by tziade@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/90338f9d80b2
Don't get TC secrets when using the client r=gbrown
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla76

This is working! There is a dramatic reduction in failures apparent in both blocked bugs. Thanks!

Can we please get this test-only patch uplifted to beta? It helps to drastically reduce the amount of task aborted tasks in CI. Thanks.

Flags: needinfo?(sheriffs)
Whiteboard: [checkin-needed-beta]
Flags: needinfo?(sheriffs)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: