Raptor tests still appear to hang when downloading conditioned profiles
Categories
(Testing :: Raptor, defect, P1)
Tracking
(firefox75 fixed, firefox76 fixed)
People
(Reporter: gbrown, Assigned: tarek)
References
Details
(Keywords: hang)
Attachments
(1 file)
+++ This bug was initially created as a clone of Bug #1618390 +++
+++ This bug was initially created as a clone of Bug #1613938 +++
Raptor tests appear to hang when downloading conditioned profile.
Despite the (much appreciated!) efforts in earlier bugs, I still see examples of this every day in bug 1589796.
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=291900307&repo=autoland&lineNumber=1042
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=292221666&repo=mozilla-central&lineNumber=1473
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=291769695&repo=autoland&lineNumber=1818
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=291916516&repo=autoland&lineNumber=768
Reporter | ||
Comment 1•4 years ago
|
||
I noticed the condprof download code checks the TASK_CLUSTER variable, using a progress bar when not running on TASK_CLUSTER. But it looks like TASK_CLUSTER is False on taskcluster:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=75005366efba0e35d7930cdfe978c0d7729a5653
So, I guess the progress bar is being used, unexpectedly, which might be "bad".
Reporter | ||
Comment 2•4 years ago
|
||
(In reply to Geoff Brown [:gbrown] from comment #1)
I noticed the condprof download code checks the TASK_CLUSTER variable, using a progress bar when not running on TASK_CLUSTER. But it looks like TASK_CLUSTER is False on taskcluster:
That's interesting, but correcting for it does not eliminate the hangs:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=543f31a5a2efe86548505ad1f4e8f9ad02835266
Assignee | ||
Comment 4•4 years ago
|
||
the variable was used here, but I guess it raptor it may not always be True ? (or that other example is buggy too)
https://searchfox.org/mozilla-central/source/testing/mozharness/mozharness/base/script.py#1781
as for the hanging thing, this is what I propose we do at this point: I am going to add some code that dumps the Python frames so we know exactly where it's blocked. Maybe that will help us understand what happens.
Updated•4 years ago
|
Assignee | ||
Comment 5•4 years ago
|
||
try run with my hang info patch:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=87c270d5e0b98b234e6cb7b699863ad104471154
let's see what we get...
Assignee | ||
Comment 6•4 years ago
|
||
not happening again. new run x15 (no windows)
https://treeherder.mozilla.org/#/jobs?repo=try&revision=d698405dcedfa58e8135f8c4f994686e4edc0734
Assignee | ||
Comment 7•4 years ago
|
||
no luck so far. please, hang!! :)
https://treeherder.mozilla.org/#/jobs?repo=try&revision=595a5dc15a8794d53cfb07ef9a122437ec540e17
Assignee | ||
Comment 8•4 years ago
•
|
||
run without my patch. will it hang?
https://treeherder.mozilla.org/#/jobs?repo=try&revision=5d22174753ea5055aae26a93a883616605203f56
YES IT HANGS !!
Assignee | ||
Comment 9•4 years ago
|
||
one more, with a windows-compatible implementation
https://treeherder.mozilla.org/#/jobs?repo=try&revision=7ec8585b6b2a00beb8c5b96b75764a3f4376fb6d
Assignee | ||
Comment 10•4 years ago
|
||
I got some hangs (ssl handshake when getting TC secrets) . more logs and a timeout on the TC call:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=7d8790cb668b6ba46db821c43e1ccd9744ce206e
Assignee | ||
Comment 11•4 years ago
|
||
I found the issue. This is triggered by trying to get a task cluster secret to obfuscate the logs -- a feature I added in the part that makes the conditioned profile and manipulates a firefox account password.
Trying to call the TC proxy when doing the loggin fails and the error is hidden, and after 100+ requests things hang.
Building the patch now to solve this.
Assignee | ||
Comment 12•4 years ago
|
||
This fixes the hang we get because we have 100+ calls on
trying to get a TC secret we don't even need (and can't get).
Assignee | ||
Comment 13•4 years ago
|
||
Assignee | ||
Comment 14•4 years ago
|
||
Assignee | ||
Comment 15•4 years ago
|
||
try run of the condprof builder to make sure that works fine, and then we can land this fix
https://treeherder.mozilla.org/#/jobs?repo=try&revision=d571abc8b89e1335d8404b733fb07de923114eca
Comment 16•4 years ago
|
||
Pushed by tziade@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/90338f9d80b2 Don't get TC secrets when using the client r=gbrown
Comment 17•4 years ago
|
||
bugherder |
Reporter | ||
Comment 18•4 years ago
|
||
This is working! There is a dramatic reduction in failures apparent in both blocked bugs. Thanks!
Comment 19•4 years ago
|
||
Can we please get this test-only patch uplifted to beta? It helps to drastically reduce the amount of task aborted tasks in CI. Thanks.
Comment 20•4 years ago
|
||
bugherder uplift |
Updated•2 years ago
|
Description
•