Closed Bug 1038941 Opened 10 years ago Closed 10 years ago

try r3.{x,}large for Android 2.3 refttests and friends

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

x86
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: kmoir, Assigned: kmoir)

References

Details

from #releng

rail	kmoir: retry rate is kind of high on c3.xlarges, as one of the next steps we should consider using other types, esp r3.{x,}alrge
	rail	they have better availability and moar RAM!
	kmoir	rail: hmm, okay
	rail	the rate now is 5-10%, for other types it's usually less than 1%
	kmoir	rail: I guess there isn't a way to model the retry rate other than trying them in production. They were all green when I ran them in staging but I only used one slave so it wasn't a good test
	kmoir	rail: is that because our instances are getting killed by other bids?
	rail	yup
	rail	you tested on on demand instances
	kmoir	yeah, I know it wasn't a good test
	rail	it was actually
	kmoir	just tested that the tests completed really
	kmoir	rail: how do you tell the retry rate?
	rail	if we can test the same on r3 instances it would help
	rail	the simplest way is to look at last 100 jobs
	rail	https://secure.pub.build.mozilla.org/builddata/reports/slave_health/slave.html?name=tst-linux64-spot for m1.medium
	rail	https://secure.pub.build.mozilla.org/builddata/reports/slave_health/slave.html?name=tst-emulator64-spot for c3.xlarge
	rail	tst-emulator64-spot lists all slaves with that pattern (a hack!)
	rail	it may take some time to show them
...
kmoir	rail: okay I'll update the bug or open a new one. We'll probably need to keep that slave class in because there are b2g tests that require it
	|<--	edehde has left moznet (Ping timeout)
	kmoir	if the r3.large or r3.xlarge work out
	catlee	r3.xlarge is very similar to c3.xlarge
	catlee	tests should Just Work™

...

kmoir	rail: so why don't we just up our bid price so we have less failures, the r3.xlarge instance prices are more expensive
	=-=	ffledgling is now known as ffledgling|lunch
	-->|	gandalf (zbraniecki@moz-DFAA4E15.p2p.sfo1.mozilla.com) has joined #releng
	rail	not the spot prices
	rail	the current price is ~80% on-demand
	rail	if we bump there will be no reason to use spot instances...
	|<--	gandalf has left moznet (Connection reset by peer)
	-->|	gandalf (zbraniecki@moz-DFAA4E15.p2p.sfo1.mozilla.com) has joined #releng
	aki	do you have one global price, or one price per node type?
	-->|	mdas (mdas@13F2CEC5.7672369.D8E68FF6.IP) has joined #releng
	catlee	per node type
	aki	ok
	kmoir	ah okay thanks for the explanation
	mdas	hello, releng! would I be able to gain access to a windows slave to see why my patch was backed out and to fix it?
	catlee	but we won't bid on it unless it's below 80% of our max, which is 80% of the ondemand limit
Assignee: nobody → kmoir
So the current retry rate is around 0.89%

https://wiki.mozilla.org/Mobile/Testing/07_30_14#A_Team

which suggests that the current instance type is holding up well.  So I'm going to close this bug.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → WONTFIX
Component: Platform Support → Buildduty
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.