Perma raptor youtube OS X 10.15 WebRender Shippable failing as exception with deadline exceeded
Categories
(Testing :: Raptor, defect, P2)
Tracking
(firefox-esr91 unaffected, firefox100 unaffected, firefox101 unaffected, firefox102 wontfix, firefox103 wontfix, firefox111 fixed)
Tracking | Status | |
---|---|---|
firefox-esr91 | --- | unaffected |
firefox100 | --- | unaffected |
firefox101 | --- | unaffected |
firefox102 | --- | wontfix |
firefox103 | --- | wontfix |
firefox111 | --- | fixed |
People
(Reporter: imoraru, Assigned: aglavic)
References
Details
(Keywords: intermittent-failure, Whiteboard: [retriggered])
Attachments
(1 file)
The failure has no failure log.
Last green jobs were on May 18th.
Push where ytp-h264-p job failed for the first time.
Push where ytp-v9-p job failed for the first time.
Reporter | ||
Comment 1•2 years ago
|
||
Hi Andrew! Can you please take a look at this?
Thank you!
Comment 2•2 years ago
|
||
Because these started to fail not with the same push but adjacent ones, could this be related to the macOS testing machines?
Updated•2 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment 4•2 years ago
|
||
:dhouse, it looks like the Mac power testing machines are unresponsive, could you take a look?
Updated•2 years ago
|
Comment hidden (Intermittent Failures Robot) |
Updated•2 years ago
|
generic-worker is failing because of errors from taskcluster like:
{
"code": "ResourceNotFound",
"message": "Worker pool releng-hardware/gecko-t-osx-1014-power does not exist\n\n---\n\n* method: registerWorker\n* errorCode: ResourceNotFound\n* statusCode: 404\n* time: 2022-06-01T15:23:45.161Z",
I'm reconfiguring them so they get connected.
:sparky they're claiming work again now. Please let me know if there are any problems.
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 10•2 years ago
|
||
Hi :dhouse, we are still having issues can you investigate?
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 13•2 years ago
|
||
:andrej I fixed this. It was an issue with scopes for the workers to pull from the 1015 queue. I set some monitoring to check if this happens again.
Can you confirm if the mac power testers are working correctly now?
Comment 14•2 years ago
|
||
Hi :dhouse
Here is some mac power data. Is this what you were referring to? (as of posting this, looks like nothing since mid-May)
Comment 15•2 years ago
|
||
(In reply to Kash [:kshampur] ⌚EST from comment #14)
Hi :dhouse
Here is some mac power data. Is this what you were referring to? (as of posting this, looks like nothing since mid-May)
This task record is on the previous queue "releng-hardware/gecko-t-osx-1014-power"
:andrej can you switch these to run on the pool "releng-hardware/gecko-t-osx-1015-power" ?
If you want, I can just switch them to using the old queue name. Or do you want to rename the queue to remove the macos version like "gecko-t-osx-power"?
Comment 16•2 years ago
|
||
:sparky can you switch these to using the new pool name? This doesn't seem ideal to have the os version hard-coded. So if you want to change the pool name, I'm can switch it np (or switch back to using the 1014 pool name to just make this work).
Comment hidden (Intermittent Failures Robot) |
Comment 18•2 years ago
•
|
||
:andrej, could you take care of making the change? We'd need to change these lines of code: https://searchfox.org/mozilla-central/search?q=t-osx-1014-power&path=&case=false®exp=false
Updated•2 years ago
|
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 21•2 years ago
|
||
What we are doing:
Switching mac power data to run on osx-1015 from osx-1014
Why:
Generic workers were failing and attempting to pull from osx-1014, so we are setting them to pull from the osx-1015 worker pool
Updated•2 years ago
|
Assignee | ||
Updated•2 years ago
|
Comment hidden (Intermittent Failures Robot) |
Updated•2 years ago
|
Updated•2 years ago
|
Comment 23•2 years ago
|
||
Pushed by aglavic@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/016cbd986226 Use macosx-1015 for power tests instead of macosx-1014. r=dhouse
Comment 24•2 years ago
|
||
bugherder |
Comment 25•2 years ago
|
||
Since nightly and release are affected, beta will likely be affected too.
For more information, please visit auto_nag documentation.
Comment 26•2 years ago
|
||
The patch landed in nightly and beta is affected.
:dhouse, is this bug important enough to require an uplift?
- If yes, please nominate the patch for beta approval.
- If no, please set
status-firefox103
towontfix
.
For more information, please visit auto_nag documentation.
Comment 27•2 years ago
|
||
:andrej what do you think? I'm re-assigning to you since I do not know if it needs to be uplifted.
Assignee | ||
Comment 28•2 years ago
•
|
||
I don't think this needs to be uplifted
Updated•2 years ago
|
Reporter | ||
Comment 29•2 years ago
|
||
Hi Andrej! Can you please take a look? Should we reopen this bug?
Thank you!
Assignee | ||
Comment 30•2 years ago
|
||
Hi dhouse,
Can you advise on the issue? You mentioned in my phabricator patch that the worker pool was not registered to taskcluster but that it should work now after you re-enabled the API key.
Updated•2 years ago
|
Updated•2 years ago
|
Comment 31•2 years ago
|
||
(In reply to Andrej (:andrej) from comment #30)
Hi dhouse,
Can you advise on the issue? You mentioned in my phabricator patch that the worker pool was not registered to taskcluster but that it should work now after you re-enabled the API key.
I got them working again last night: I checked and my api key was disabled again. I will check into it more this week. I think there is automation disabling the api key.
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 33•2 years ago
|
||
No intermittent fails in the past 2 weeks, closing bug
Assignee | ||
Updated•2 years ago
|
Reporter | ||
Comment 34•1 year ago
|
||
Hi Andrej! Can you please take a look at this? It is happening again, last green job was on Feb 10th: range from last green job.
Comment hidden (Intermittent Failures Robot) |
Comment 36•1 year ago
|
||
The worker these ran on is offline and gecko-t-osx-1015-power pool lists no workers online. masterwayz, is this something you could look into as I see dhouse is no longer available? Thank you.
Comment 37•1 year ago
|
||
Will look into this! Will probably end up being later today due to some bitbar access permissions that someone else first needs to fix for me.
And yes, I'll have to be pinged instead of dhouse.
(leaving NI in place)
Comment 38•1 year ago
|
||
masterwayz, any updates on this kind of workers?
Comment hidden (Intermittent Failures Robot) |
Comment 40•1 year ago
|
||
I lack the access/context here so passing this along.
Comment 41•1 year ago
|
||
Michelle, I was able to ssh into these hosts.
Details at https://mozilla-hub.atlassian.net/wiki/spaces/ROPS/pages/179601939/Bitbar+Mac+Hosts.
Please let me know if you have any issues.
Comment 42•1 year ago
|
||
Ah right, those documents didn't exist yet, thanks for making them!
I will do this once I get access to that SSH key...
Comment 44•1 year ago
|
||
Jobs are picked up again, thank you for the fix.
Comment hidden (Intermittent Failures Robot) |
Description
•