Closed Bug 1199226 Opened 10 years ago Closed 10 years ago

Very high test backlog

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 1199347

People

(Reporter: RyanVM, Unassigned)

Details

Linux32 tests are over 1700 pending and Linux64 over 2500. All trees closed.
And now all the Windows platforms shot up into the thousands. 9:45:43 AM - nagios-releng: Thu 06:45:42 PDT [4044] cruncher.srv.releng.scl3.mozilla.com:Pending builds is CRITICAL: CRITICAL Pending Builds: 9938 (http://m.mozilla.org/Pending+builds) Yeah, that's bad.
Summary: Very high AWS test backlog → Very high test backlog
If we only focused on https://treeherder.mozilla.org/#/jobs?repo=fx-team&revision=0077cc462038&filter-searchStr=Ubuntu%20VM%2012.04%20fx-team%20opt%20test%20cppunit and we open buildapi for that revision: https://secure.pub.build.mozilla.org/buildapi/self-serve/fx-team/rev/0077cc462038 We can see that first job [1] got scheduled by: "scheduler": "tests-fx-team-ubuntu32_vm-opt-unittest" while the second got scheduled by [2]: "scheduler": "tests-fx-team-ubuntu32_vm-opt-unittest-7-3600" What is this -7-3600 scheduler? It sounds like it is related to: http://mxr.mozilla.org/build/source/buildbot-configs/mozilla-tests/config_seta.py#9 [1] https://secure.pub.build.mozilla.org/buildapi/self-serve/fx-team/build/80032037 [2] https://secure.pub.build.mozilla.org/buildapi/self-serve/fx-team/build/80047712
Current theory is that SETA went bonkers for some reason. The cause remains unknown, but the pending numbers are sane again, so I've reopened. I'm leaving this bug open for the ongoing investigation, but feel free to close it and move the discussion elsewhere if you want.
Severity: blocker → normal
So I looked at the logs on the scheduling master I looks like SETA stopped skipping jobs from 00:51 until 06:37 I just grepped for "skipped" for today's logs on bm81 2015-08-27 00:51:07-0700 [-] tests-fx-team-ubuntu64_vm-opt-unittest-7-3600: skipping with 2/7 important changes since only 151/3600s have elapsed 2015-08-27 00:51:07-0700 [-] tests-fx-team-yosemite-debug-unittest-7-3600: skipping with 1/7 important changes since only 229/3600s have elapsed 2015-08-27 00:51:07-0700 [-] tests-mozilla-inbound-win7-ix-debug-unittest-7-3600: skipping with 1/7 important changes since only 3570/3600s have elapsed 2015-08-27 00:51:07-0700 [-] tests-mozilla-inbound-yosemite-opt-unittest-7-3600: skipping with 1/7 important changes since only 467/3600s have elapsed 2015-08-27 06:37:43-0700 [-] tests-fx-team-ubuntu64_vm_mobile-opt-unittest-7-3600: skipping with 4/7 important changes since only 277/3600s have elapsed 2015-08-27 06:38:05-0700 [-] tests-fx-team-ubuntu64_vm_armv7_large-opt-unittest-7-3600: skipping with 4/7 important changes since only 47/3600s have elapsed This same problem didn't occur the 26th, I looked at the logs. I still don't understand why SETA skipping jobs stopped at 00:51. I also don't understand why it started skipping jobs again at 6:37am. Will investigate. However, probably the machine pools were able to keep up with load overnight but when daytime ET hit, the number of pushes increased and caused the high pending counts w/o SETA. Looking at m-i there certainly was reason to skip jobs from midnight to 6am giving the intervals of pushes.
Opened bug 1199347 to investigate SETA scheduling issues.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → DUPLICATE
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.