Closed
Bug 818833
(toodamnhigh!)
Opened 12 years ago
Closed 11 years ago
Pending count for Linux32 test slaves is too high
Categories
(Release Engineering :: General, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: emorley, Assigned: joduinn)
References
Details
(Keywords: sheriffing-P1, Whiteboard: [capacity])
Attachments
(3 files)
2.62 KB,
patch
|
bhearsum
:
review+
bhearsum
:
checked-in+
|
Details | Diff | Splinter Review |
2.58 KB,
patch
|
bhearsum
:
review+
Callek
:
checked-in+
|
Details | Diff | Splinter Review |
1.11 KB,
patch
|
bhearsum
:
review+
Callek
:
checked-in+
|
Details | Diff | Splinter Review |
I would imagine this is a combination of: * "Linux used to be the fastest platform to request on Try, so if I'm testing just one platform, I'll request linux (32) in my trychooser syntax" * B2G emulator uses linux slaves ~ Non-try: Pending test(s) @ Dec 06 02:05:02 linux (70) 25 mozilla-aurora 24 mozilla-inbound 11 ionmonkey 10 mozilla-central linux64 (22) 9 mozilla-beta 8 ionmonkey 5 mozilla-inbound mac10.6-rev4 (13) 13 mozilla-inbound mac10.7 (13) 13 mozilla-inbound mac10.8 (2) 2 mozilla-inbound winxp (5) 5 ionmonkey ~ Try: Pending test(s) @ Dec 06 02:05:02 linux (1057) 1057 try linux64 (14) 14 try win7 (89) 89 try winxp (27) 27 try
Reporter | ||
Comment 1•12 years ago
|
||
jgriffin, is there a bug filed for trying to use the emulator on linux64 slaves too? Happy to file one if not :-)
Flags: needinfo?(jgriffin)
Comment 2•12 years ago
|
||
We should really attempt to move them to aws. The rev3 minis won't scale even if you split them among 2 of the platforms. Can we encourage people to use fedora64 on the try server?
Comment 3•12 years ago
|
||
So far the pool looks OK, we have not so much hung or waiting for reboot test slaves. The running jobs look sane as well.
Comment 4•12 years ago
|
||
(In reply to Ed Morley [UTC+0; email:edmorley@moco] from comment #1) > jgriffin, is there a bug filed for trying to use the emulator on linux64 > slaves too? Happy to file one if not :-) No, there's no such bug yet.
Flags: needinfo?(jgriffin)
Comment 5•12 years ago
|
||
Currently I have to debug marionette code on inbound. Thats BAD! I pushed to try but even after 24hours the marionette tests were still pending. 16 out of 35 tests fail locally on my linux machine on mc tip and nobody knows why. We need try-server coverage for marionette tests!
Reporter | ||
Comment 6•12 years ago
|
||
(In reply to Jonathan Griffin (:jgriffin) from comment #4) > (In reply to Ed Morley [UTC+0; email:edmorley@moco] from comment #1) > > jgriffin, is there a bug filed for trying to use the emulator on linux64 > > slaves too? Happy to file one if not :-) > > No, there's no such bug yet. Filed bug 818968. (In reply to Ed Morley [UTC+0; email:edmorley@moco] from comment #0) > I would imagine this is a combination of: > * "Linux used to be the fastest platform to request on Try, so if I'm > testing just one platform, I'll request linux (32) in my trychooser syntax" Posted to dev.platform to try and reverse this habit: https://groups.google.com/d/msg/mozilla.dev.platform/XcVw9IeUXVU/wvHM0OjCN5MJ
Whiteboard: [buildduty] → [buildduty][capacity]
Reporter | ||
Comment 7•12 years ago
|
||
At a similar time to yesterday, we're now up to 1500 pending linux32 Try jobs (from ~1000) :-(
Reporter | ||
Comment 8•12 years ago
|
||
We now have Try jobs that are still pending after 1 day 20 hours :-(
Reporter | ||
Comment 9•12 years ago
|
||
Mandatory meme :-) http://www.quickmeme.com/meme/3s585s/
Comment 10•12 years ago
|
||
The current fix in line for this is turning off linux32 desktop tests on m-c and project branches. Newsgroup posts incoming.
Comment 11•12 years ago
|
||
(In reply to Aki Sasaki [:aki] from comment #10) > The current fix in line for this is turning off linux32 desktop tests on m-c > and project branches. Newsgroup posts incoming. Maybe m-i and branches? m-c is not so overloaded.
Comment 12•12 years ago
|
||
That would just lead to them being hidden.
Assignee | ||
Comment 13•12 years ago
|
||
Totally true, and only going to get worse as more B2G tests suites are figured out, and start running in production. Meanwhile, new hardware is still months out. We've proposed some short term plans for reducing load on linux32 test slaves in dev.planning newsgroup, and the dev platform meetings (last week and today). Details still being worked out, so grabbing this for now.
Assignee: nobody → joduinn
Reporter | ||
Updated•12 years ago
|
Keywords: sheriffing-P1
Reporter | ||
Comment 14•12 years ago
|
||
B2G marionette-webapi is getting broken quite frequently - and is broken again this morning - and the high levels of coalescing often mean I've had had to close the tree to bisect due to the large range. I suspect the bustage would have been more obvious (and thus in today's case not left in for 45 pushes), if more jobs had been run and so oranges showing on TBPL. Thank you for looking at this :-)
Reporter | ||
Comment 15•12 years ago
|
||
Release branches keep on stealing linux32 testslaves (meaning the marionette retriggers on inbound are still pending and holding inbound closed), so I've temporarily closed aurora, beta, b2g18, esr18, esr10 temporarily to ensure we can get inbound open sooner rather than later.
Reporter | ||
Comment 16•12 years ago
|
||
There are ~4 linux32 machines that haven't taken a job in anywhere from 10hrs to a day+. Could you give them a kick? :-) http://build.mozilla.org/builds/last-job-per-slave.html
Comment 17•12 years ago
|
||
The chronic issue of not having enough capacity is not an issue that buildduty can deal with. We'll continue to deal with it acutely by unsticking slaves and the like, but I'm removing [buildduty] because this unactionable for buildduty.
Whiteboard: [buildduty][capacity] → [capacity]
Comment 18•12 years ago
|
||
I'm currently going through the buildduty queue and lost job per slave list and kicking machines. I've already fixed up a couple of the fed ones.
Reporter | ||
Comment 19•12 years ago
|
||
That's great - thank you :-)
Assignee | ||
Comment 20•12 years ago
|
||
At this point, we've done the following: 1) adjusted priority of b2g jobs 2) disabling any known broken test jobs (which are just wasting cpu cycles) 3) scavenged additional test machines from others ...which helped. A bit. But not enough to offset: 1) the traditional busiest week of the year is first week after Christmas/NewYears. ie now. 2) the b2g workweek in progress this week, which is also last one before 15jan, so a spike in b2g traffic. 3) the lack of replacement machines to keep up with load, which are still some way off from being delivered/online. As a short term emergency move, we're disabling linux32 desktop test jobs effective immediately until we get through this week. Note: given today's FF18.0/FF10esr/FF17esr releases, we will leave linux32 desktop test jobs enabled on mozilla-aurora/beta/release/esr10/esr17. These branches combined are ~7% of load, so not significant load, but they are important to have in case we need to chemspill. Note: this change is for linux32 *test* jobs only. linux32 builds continue as usual. Also, linux64 builds and tests continue as usual.
Comment 21•12 years ago
|
||
Attachment #699786 -
Flags: review?(bhearsum)
Updated•12 years ago
|
Attachment #699786 -
Flags: review?(bhearsum) → review+
Comment 22•12 years ago
|
||
Comment on attachment 699786 [details] [diff] [review] [configs] per c#20 This was landed and put into production a couple of hours ago.
Attachment #699786 -
Flags: checked-in+
Assignee | ||
Comment 23•12 years ago
|
||
cjones: from newsgroups, there was a question about whether B2G still needed the crashtest-ipc test suite run on linux32, or if this was now covered by other suites. Per comment#20, we disabled linux32 test suites, including crashtest-ipc, on most branches last night, in order to improve b2g test waittimes. Let us know if you need this reenabled.
Assignee | ||
Comment 24•12 years ago
|
||
per bmoss, the volume+urgency on b2g checkins has decreased to the point that we are now ok to start re-enabling the linux32 tests that we disabled last week. The linux32 desktop tests should be back live in production soon, and certainly sometime today.
Comment 25•12 years ago
|
||
Attachment #701849 -
Flags: review?(bhearsum)
Updated•12 years ago
|
Attachment #701849 -
Flags: review?(bhearsum) → review+
Comment 26•12 years ago
|
||
Comment on attachment 701849 [details] [diff] [review] [configs] re-enable http://hg.mozilla.org/build/buildbot-configs/rev/98b27f79a36d
Attachment #701849 -
Flags: checked-in+
Comment 27•12 years ago
|
||
This is in production.
Sorry for the reply lag. (In reply to John O'Duinn [:joduinn] from comment #23) > cjones: from newsgroups, there was a question about whether B2G still needed > the crashtest-ipc test suite run on linux32, or if this was now covered by > other suites. B2G doesn't (directly) need crashtest-ipc on linux32. However, those tests along with reftest-ipc are the only thing that keeps cross-process graphics somewhere close to working on desktop builds. That's not a shipping configuration, but it's important for developers. There's also a project on the back burner that wants this.
Reporter | ||
Comment 29•12 years ago
|
||
Linux32 try pending counts are pretty bad again: http://builddata.pub.build.mozilla.org/reports/pending/pending_test_try_day.png
Updated•12 years ago
|
Alias: toodamnhigh!
Comment 30•12 years ago
|
||
(In reply to Justin Wood (:Callek) from comment #26) > Comment on attachment 701849 [details] [diff] [review] > [configs] re-enable > > http://hg.mozilla.org/build/buildbot-configs/rev/98b27f79a36d This never re-enabled tests for the Thunderbird tree(s).
Comment 31•12 years ago
|
||
with apologies to the TB team
Attachment #707091 -
Flags: review?(bhearsum)
Updated•12 years ago
|
Attachment #707091 -
Flags: review?(bhearsum) → review+
Comment 32•12 years ago
|
||
Comment on attachment 707091 [details] [diff] [review] [configs] v1 - re-enable for TB as well. http://hg.mozilla.org/build/buildbot-configs/rev/b30523f75d91
Attachment #707091 -
Flags: checked-in+
Assignee | ||
Comment 33•12 years ago
|
||
Yesterday we did 48,302 test jobs. Help is on the way with bug#835955 live in production before the end of this week.
Depends on: 835955
Comment 34•11 years ago
|
||
still critical?
Comment 35•11 years ago
|
||
Neither critical nor an issue.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
Updated•6 years ago
|
Component: General Automation → General
You need to log in
before you can comment on or make changes to this bug.
Description
•