Closed
Bug 847868
Opened 12 years ago
Closed 12 years ago
WinXP and Win7 test slaves backed up on Try
Categories
(Release Engineering :: General, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: emorley, Unassigned)
References
Details
See bottom graph on:
http://builddata.pub.build.mozilla.org/reports/pending/pending.html
Even though we're at the quiet time of day there are still a lot of WinXP/Win7 jobs pending.
win7 (433)
433 try
winxp (298)
298 try
What's changed recently? WinXP/Win7 jobs are not normally the longest pole for Try end-to-end times, but they are at the moment.
Do some Windows machines need a kick? I can only see ~6-8 windows machines on http://build.mozilla.org/builds/last-job-per-slave.html that look like they may have hung.
Catlee, can you find someone to look at this? :-)
Flags: needinfo?(catlee)
Comment 1•12 years ago
|
||
https://build.mozilla.org/builds/pending/running.html looks like we're consistently running quite a few winxp/win7.
Could it be that something has regressed test times recently?
We've also fixed branch prioritization so now try really is prioritized lower than most other branches.
Flags: needinfo?(catlee)
Comment 2•12 years ago
|
||
https://secure.pub.build.mozilla.org/buildapi/running is lying about some of the test runs. it claims some jobs have been running for more than a day, but the machines have moved onto other work since then.
we had some networking problems over the weekend, so perhaps machines are wedged due to that
Comment 3•12 years ago
|
||
per irc w/avih:
He pushed https://tbpl.mozilla.org/?tree=Try&rev=88889739736c to try:
1) builds for all OS went fine
2) tests for osx and linux went fine
3) tests for win7 and winXP are backlogged.
Adding info here to help debugging....
Comment 4•12 years ago
|
||
Well, I'll note that on https://tbpl.mozilla.org/?tree=Try&rev=95f02e4036d4, the OSX tests (other than 10.8, which was a bit better) took around 10 hours to start. Windows was worse, but mac was pretty horrible. And Fedora32 took around 12 hours
17 hours later, my Try (pushed @ 10:30pm) is still waiting for Win7 results:
https://tbpl.mozilla.org/?tree=Try&rev=0d841f84764ahttps://tbpl.mozilla.org/?tree=Try&rev=0d841f84764a
WinXP ran at around 9am (~11.5 hours after push)
Comment 5•12 years ago
|
||
(In reply to Randell Jesup [:jesup] from comment #4)
> Well, I'll note that on https://tbpl.mozilla.org/?tree=Try&rev=95f02e4036d4,
> the OSX tests (other than 10.8, which was a bit better) took around 10 hours
> to start. Windows was worse, but mac was pretty horrible. And Fedora32
> took around 12 hours
>
> 17 hours later, my Try (pushed @ 10:30pm) is still waiting for Win7 results:
> https://tbpl.mozilla.org/?tree=Try&rev=0d841f84764ahttps://tbpl.mozilla.org/
> ?tree=Try&rev=0d841f84764a
>
> WinXP ran at around 9am (~11.5 hours after push)
(after mid-air)....per irc w/jesup:
rjesup pushed https://tbpl.mozilla.org/?tree=Try&rev=0d841f84764a to try last night @ Mon Mar 4 22:03:41 2013 PST:
1) builds for all OS completed
2) tests for osx and linux completed
3) tests for win7 and winXP are backlogged.
Comment 6•12 years ago
|
||
We actually publish the pending jobs, in a table that's sortable by clicking on the "submitted at" column header, at https://secure.pub.build.mozilla.org/buildapi/pending - you don't really need IRC to find out that the current Win7 backlog is 18.5 hours and the current WinXP backlog is 14.5 hours. You'll get it anyway, but you don't *need* it ;)
Comment 7•12 years ago
|
||
(In reply to Phil Ringnalda (:philor) from comment #6)
> ... that the current Win7 backlog is 18.5 hours ...
20 hours now. It either isn't dequeued, or dequeues slower than jobs are queued.
Anything changed recently which might be causing this?
Comment 8•12 years ago
|
||
(In reply to Avi Halachmi (:avih) from comment #7)
> (In reply to Phil Ringnalda (:philor) from comment #6)
> > ... that the current Win7 backlog is 18.5 hours ...
>
> 20 hours now. It either isn't dequeued, or dequeues slower than jobs are
> queued.
>
> Anything changed recently which might be causing this?
Nothing that we've found at this point, but we're definitely looking into it!
Comment 9•12 years ago
|
||
We're back down to more usual levels of pending jobs on try now. I'm still not certain of the cause of this spike in wait time.
Severity: critical → normal
Comment 10•12 years ago
|
||
We had a chance to catch up with the tree closure yesterday.
Reporter | ||
Comment 11•12 years ago
|
||
It doesn't help too much that our average test runtime is greatest on Windows:
http://brasstacks.mozilla.com/gofaster/#/executiontime/test
Comment 12•12 years ago
|
||
(In reply to Ed Morley [:edmorley UTC+0] from comment #11)
> It doesn't help too much that our average test runtime is greatest on
> Windows:
> http://brasstacks.mozilla.com/gofaster/#/executiontime/test
I looked at xpcshell tests in bug 617503. That work only scratched the surface. There's probably a lot more that can be investigated / done to speed them up.
Comment 13•12 years ago
|
||
Any actions left for this bug? Wait times are better now.
I'm going to open a new one to add those rev3 minis that we have scavenged.
We've also added a handful of win7 and winxp staging machines as production.
We've cleaned few machines that were in limbo.
We've deployed this morning a _dumbwin32proc.py which will allow the XP slaves to cancel builds rather than run all the way.
From this week (Tue, 12 Mar 2013):
Wait: 43605/82.68% (testpool)
xp: 4413
0: 3207 72.67%
15: 711 16.11%
30: 152 3.44%
45: 40 0.91%
60: 79 1.79%
75: 27 0.61%
90+: 197 4.46%
From last week (Wed, 06 Mar 2013):
Wait: 42168/74.70% (testpool)
xp: 4417
0: 2618 59.27%
15: 677 15.33%
30: 181 4.10%
45: 83 1.88%
60: 43 0.97%
75: 41 0.93%
90+: 774 17.52%
win7: 4404
0: 2423 55.02%
15: 829 18.82%
30: 268 6.09%
45: 152 3.45%
60: 38 0.86%
75: 46 1.04%
90+: 648 14.71%
Reporter | ||
Comment 14•12 years ago
|
||
(In reply to Armen Zambrano G. [:armenzg] from comment #13)
> We've deployed this morning a _dumbwin32proc.py which will allow the XP
> slaves to cancel builds rather than run all the way.
Ah great :-)
Comment 15•12 years ago
|
||
(In reply to Armen Zambrano G. [:armenzg] (Release Enginerring) from comment #13)
> Any actions left for this bug? Wait times are better now.
>
> I'm going to open a new one to add those rev3 minis that we have scavenged.
> We've also added a handful of win7 and winxp staging machines as production.
> We've cleaned few machines that were in limbo.
> We've deployed this morning a _dumbwin32proc.py which will allow the XP
> slaves to cancel builds rather than run all the way.
>
> From this week (Tue, 12 Mar 2013):
> Wait: 43605/82.68% (testpool)
> xp: 4413
> 0: 3207 72.67%
> 15: 711 16.11%
> 30: 152 3.44%
> 45: 40 0.91%
> 60: 79 1.79%
> 75: 27 0.61%
> 90+: 197 4.46%
>
> From last week (Wed, 06 Mar 2013):
> Wait: 42168/74.70% (testpool)
> xp: 4417
> 0: 2618 59.27%
> 15: 677 15.33%
> 30: 181 4.10%
> 45: 83 1.88%
> 60: 43 0.97%
> 75: 41 0.93%
> 90+: 774 17.52%
>
> win7: 4404
> 0: 2423 55.02%
> 15: 829 18.82%
> 30: 268 6.09%
> 45: 152 3.45%
> 60: 38 0.86%
> 75: 46 1.04%
> 90+: 648 14.71%
edmorley: We're still getting consistently good wait times... anything left to do here or can we close this bug as FIXED?
Flags: needinfo?(emorley)
Reporter | ||
Comment 16•12 years ago
|
||
Looks fine to me now, thank you :-)
Status: NEW → RESOLVED
Closed: 12 years ago
Flags: needinfo?(emorley)
Resolution: --- → FIXED
Assignee | ||
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
Assignee | ||
Updated•7 years ago
|
Component: General Automation → General
You need to log in
before you can comment on or make changes to this bug.
Description
•