Closed Bug 1147853 Opened 9 years ago Closed 9 years ago

Widespread "InternalError: Starting video failed" failures across all trees on AWS-based test instances

Categories

(Release Engineering :: General, defect)

x86
Linux
defect
Not set
critical

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: RyanVM, Unassigned)

References

Details

This is happening across all trees, including on pushes that had previously-green runs. AMI change or something maybe?

https://treeherder.mozilla.org/logviewer.html#?job_id=1224907&repo=mozilla-central

 1835 INFO TEST-UNEXPECTED-FAIL | dom/media/tests/mochitest/identity/test_peerConnection_peerIdentity.html | Error in test execution: InternalError: Starting video failed - expected PASS 

etc

All trees closed.
As mentioned in https://bugzilla.mozilla.org/show_bug.cgi?id=1143018#c11, we normalized the basedir for slaves yesterday, both in the slavealloc db and in runslave.py. Assuming AMI generation happened properly last night, that change would have hit AWS today.
Assignee: nobody → coop
Status: NEW → ASSIGNED
(In reply to Chris Cooper [:coop] from comment #1)
> As mentioned in https://bugzilla.mozilla.org/show_bug.cgi?id=1143018#c11, we
> normalized the basedir for slaves yesterday, both in the slavealloc db and
> in runslave.py. Assuming AMI generation happened properly last night, that
> change would have hit AWS today.

*However* we didn't remove the old dir or any references that pointed to it, so the resources should still be there. We might be relying on a relative path though in the tests.
Bug 1143018 is implicated in that it moved where the various buildbot state files (twistd.pid, logs) are created from /builds/slave/talos-slave to /builds/slave.

Morgan is looking into whether runner is failing because of this.
In the absence of an obvious cause, we're rolling back the AMIs to yesterday's versions while we continue to investigate. Existing instances should self-terminate and restart shortly.
Rolling back the AMI seems to have worked. Trees reopened.
Depends on: 1149740
Depends on: 1149580
Assignee: coop → nobody
Status: ASSIGNED → NEW
Component: Buildduty → General Automation
QA Contact: bugspam.Callek → catlee
Severity: blocker → critical
resolved by https://bugzil.la/1149580
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.