Closed
Bug 967816
Opened 11 years ago
Closed 11 years ago
Gaia-ui-tests on Linux nearly perma-fail since Feb 2
Categories
(Firefox OS Graveyard :: Gaia::UI Tests, defect)
Firefox OS Graveyard
Gaia::UI Tests
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: jgriffin, Unassigned)
References
Details
Since Feb 2, the gaia-ui-tests on b2g desktop on linux have become nearly perma-fail; they time out at entirely random places. The same tests running on osx are not affected.
Looking at b2g-inbound, it looks like this problem began around https://tbpl.mozilla.org/?tree=Mozilla-Inbound&jobname=gaia-ui&showall=1&rev=83a3ef9b2144, but I'm doing some retriggers before and after to attempt to confirm.
Reporter | ||
Comment 1•11 years ago
|
||
(In reply to Jonathan Griffin (:jgriffin) from comment #0)
> Looking at b2g-inbound, it looks like this problem began around
> https://tbpl.mozilla.org/?tree=Mozilla-Inbound&jobname=gaia-
> ui&showall=1&rev=83a3ef9b2144, but I'm doing some retriggers before and
> after to attempt to confirm.
Er, that's on inbound. On b2g-inbound, there's enough noise to make the situation a little unclear, but I'm requesting some retriggers there too.
Comment 2•11 years ago
|
||
If this is a crash, then bug 949028 will help when/if I can get it working.
Depends on: 949028
Reporter | ||
Comment 3•11 years ago
|
||
I'm not convinced this is not an infrastructure issue. Retriggers on inbound from Sunday are showing as much redder than the original runs, which would indicate an infrastructure change that was made after those initial runs. See e.g. https://tbpl.mozilla.org/?tree=Mozilla-Inbound&showall=1&jobname=gaia-ui&rev=fac849dd7be9 and earlier pushes, for which I've done a bunch of retriggers.
I know we moved to a different AWS node type, but I haven't had a firm answer as to when exactly that happened. Catlee, rail, can you tell us?
Flags: needinfo?(rail)
Flags: needinfo?(catlee)
Reporter | ||
Comment 4•11 years ago
|
||
s/which would indicate an infrastructure change/which _could_ indicate an infrastructure change/
Comment 5•11 years ago
|
||
Migration from m1.medium to m3.medium happened in 2 steps:
1) on demand slaves (tst-linux*-ec2-xxx) have been switched to m3.medium around Jan 28-29
2) spot slaves (tst-linux*-spot-xxx) have been switched to m3.medium after Feb 2 (http://hg.mozilla.org/build/cloud-tools/rev/6487dca66616)
Flags: needinfo?(rail)
Flags: needinfo?(catlee)
Reporter | ||
Comment 6•11 years ago
|
||
That timeline looks pretty consistent with the pattern of increased (nearly permared) failures we're seeing, although from the specs it's hard to see how the new instance type would be causing these problems.
One way to tell would be to switch spot instances back to m1.medium for a few days to see if our failure rate comes back down.
In tandem, Andreas Tolfsen on our team is going to investigate this on one of the on-demand slaves to see if we can get more information about the failures.
Comment 7•11 years ago
|
||
Hidden on trunk.
Reporter | ||
Comment 8•11 years ago
|
||
These are green again (except for unrelated bug 970166, which I'm landing a fix for today); can we unhide them?
Comment 9•11 years ago
|
||
The https://tbpl.mozilla.org/php/getParsedLog.php?id=34567438&tree=Mozilla-Inbound thing now that they're back on m1.medium looks like more than 10%, from a quick glance.
Comment 10•11 years ago
|
||
Fair enough.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•