Closed Bug 827491 Opened 13 years ago Closed 13 years ago

strange try_spidermonkey scheduling behaviour

Categories

(Release Engineering :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: catlee, Assigned: sfink)

Details

Attachments

(1 file)

There have been a few days now where we get a few hundred 'Linux x86-64 try leak test spidermonkey_try-rootanalysis build' builds created at the same time. Something happens so that the 'try_spidermonkey' scheduler suddenly finds hundreds of unprocessed change objects. I don't have much else to go on here...some log data below: 2013-01-07 12:07:39-0800 [-] Looking at changes: [<buildbot.changes.changes.Change instance at 0x2aaab714a758>, <buildbot.changes.changes.Change instance at 0xf7c0128>, <buildbot.changes.changes.Change instance at 0x15b3eea8>, ... 2013-01-07 12:07:39-0800 [-] Found try message in the change comments, ignoring push comments 2013-01-07 12:07:39-0800 [-] TryChooser OPTIONS : MESSAGE Namespace(build=['opt'], talos=u'none', test=u'none', user_platforms=[u'linux64']) : try: -b o -e -p linux64 -u none -t none ... 2013-01-07 12:07:39-0800 [-] try_spidermonkey: propfunc returned {'builduid': '54f6b3d002494cafb23d44024931149e'} 2013-01-07 12:07:39-0800 [-] try_spidermonkey: propfunc returned {'builduid': '1f848ab044004bb8b772e885829b4cc4'} 2013-01-07 12:07:39-0800 [-] try_spidermonkey: propfunc returned {'builduid': '2ff040d72529401491df52941d33c936'} 2013-01-07 12:07:39-0800 [-] try_spidermonkey: propfunc returned {'builduid': 'f7c4c304e23d4c128b0eccb48902917a'} 2013-01-07 12:07:39-0800 [-] try_spidermonkey: propfunc returned {'builduid': '749de056104b483588f35c1a4934cd7e'} 2013-01-07 12:07:39-0800 [-] try_spidermonkey: propfunc returned {'builduid': '1b00c69154af4f9e8ef90abbe13c692e'} 2013-01-07 12:07:39-0800 [-] try_spidermonkey: propfunc returned {'builduid': 'f20e0506d374415d864a06b2b85b8ee8'} 2013-01-07 12:07:39-0800 [-] try_spidermonkey: propfunc returned {'builduid': '0af3e68d3d564e88b30d28bc83a32b41'} I believe this is responsible for the majority of the builds waiting > 90 minutes in the trybuildpool report.
On a related note, I have seen a number of try pushes that should have gotten these builds and didn't. So I'm unclear on whether it's dredging up old changes to schedule builds for, or it's just ignoring builds for a while and then processing them in a clump. Can you tell how many changes are pending in the database, and when they are from?
An example recent push that is missing these builds: https://tbpl.mozilla.org/?tree=Try&rev=20e3cbac0414&noignore=1 It's from Jan 7 (today) 7:05am PST. Maybe it would be useful to look at to see if its builds are getting delayed/batched or something. I guess I should figure out how to get into the build VPN so I could dig further with the buildbot HTML UI.
the database thinks its up-to-date right now
ah, here's something... mysql> select * from scheduler_changes where schedulerid=4715; +-------------+----------+-----------+ | schedulerid | changeid | important | +-------------+----------+-----------+ | 4715 | 2000424 | 0 | | 4715 | 2000378 | 0 | | 4715 | 2000338 | 0 | | 4715 | 2000377 | 0 | | 4715 | 2000663 | 0 | | 4715 | 2000534 | 0 | | 4715 | 2000484 | 0 | | 4715 | 2000447 | 0 | | 4715 | 2000296 | 0 | +-------------+----------+-----------+ so the scheduler doesn't think these changes are important...
I didn't see the last comment until well after this IRC snippet: [14:07] catlee ok, how's this theory [14:08] http://hg.mozilla.org/build/buildbotcustom/file/default/misc.py#l2506 looks for changes touching 'js' so we don't run root analysis on all pushes, just when we change js this results is many changes being marked 'unimportant', esp on try [14:09] the first 'important' change results in a build that contains all these changes however on try we explicitly break this up into one build per change [14:10] so instead of getting one build with an important change + other unimportant changes, we get one build for the 'important' change, and one for all previous 'unimportant' changes But I think the above theory is correct, and the fix is to set onlyImportant=True when constructing all of the spidermonkey schedulers.
Ugh. Except my dev environment says we're using buildbot 0.8.2, which doesn't have the onlyImportant option. So we'd need a ChangeFilter with a filter_fn defined instead.
Assignee: nobody → sphink
I'm pretty sure this is because the scheduler doesn't consider these changes important due to http://hg.mozilla.org/build/buildbotcustom/file/default/misc.py#l2506 So the scheduler is accumulating a bunch of unimportant changes, which then get triggered all at once when someone pushes a change that *does* touch js/src. And because we use treeStableTimer=None, we create a new build request for each change in the queue, which includes all the previous unimportant changes.
Comment on attachment 698887 [details] [diff] [review] Completely ignore non-JS changes instead of queueing them up I think you need to import ChangeFilter, but looks good otherwise.
Attachment #698887 - Flags: review?(catlee) → review+
Attachment #698887 - Flags: checked-in+
http://hg.mozilla.org/build/buildbotcustom/rev/8d9926c1417c Also had to move the branch argument into the ChangeFilter. Passes test-masters.sh.
in production
looks like this is fixed now
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: