Closed
Bug 827491
Opened 13 years ago
Closed 13 years ago
strange try_spidermonkey scheduling behaviour
Categories
(Release Engineering :: General, defect)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: catlee, Assigned: sfink)
Details
Attachments
(1 file)
|
1000 bytes,
patch
|
catlee
:
review+
sfink
:
checked-in+
|
Details | Diff | Splinter Review |
There have been a few days now where we get a few hundred 'Linux x86-64 try leak test spidermonkey_try-rootanalysis build' builds created at the same time.
Something happens so that the 'try_spidermonkey' scheduler suddenly finds hundreds of unprocessed change objects.
I don't have much else to go on here...some log data below:
2013-01-07 12:07:39-0800 [-] Looking at changes: [<buildbot.changes.changes.Change instance at 0x2aaab714a758>, <buildbot.changes.changes.Change instance at 0xf7c0128>, <buildbot.changes.changes.Change instance at 0x15b3eea8>, ...
2013-01-07 12:07:39-0800 [-] Found try message in the change comments, ignoring push comments
2013-01-07 12:07:39-0800 [-] TryChooser OPTIONS : MESSAGE Namespace(build=['opt'], talos=u'none', test=u'none', user_platforms=[u'linux64']) : try: -b o -e -p linux64 -u none -t none
...
2013-01-07 12:07:39-0800 [-] try_spidermonkey: propfunc returned {'builduid': '54f6b3d002494cafb23d44024931149e'}
2013-01-07 12:07:39-0800 [-] try_spidermonkey: propfunc returned {'builduid': '1f848ab044004bb8b772e885829b4cc4'}
2013-01-07 12:07:39-0800 [-] try_spidermonkey: propfunc returned {'builduid': '2ff040d72529401491df52941d33c936'}
2013-01-07 12:07:39-0800 [-] try_spidermonkey: propfunc returned {'builduid': 'f7c4c304e23d4c128b0eccb48902917a'}
2013-01-07 12:07:39-0800 [-] try_spidermonkey: propfunc returned {'builduid': '749de056104b483588f35c1a4934cd7e'}
2013-01-07 12:07:39-0800 [-] try_spidermonkey: propfunc returned {'builduid': '1b00c69154af4f9e8ef90abbe13c692e'}
2013-01-07 12:07:39-0800 [-] try_spidermonkey: propfunc returned {'builduid': 'f20e0506d374415d864a06b2b85b8ee8'}
2013-01-07 12:07:39-0800 [-] try_spidermonkey: propfunc returned {'builduid': '0af3e68d3d564e88b30d28bc83a32b41'}
I believe this is responsible for the majority of the builds waiting > 90 minutes in the trybuildpool report.
| Assignee | ||
Comment 1•13 years ago
|
||
On a related note, I have seen a number of try pushes that should have gotten these builds and didn't.
So I'm unclear on whether it's dredging up old changes to schedule builds for, or it's just ignoring builds for a while and then processing them in a clump.
Can you tell how many changes are pending in the database, and when they are from?
| Assignee | ||
Comment 2•13 years ago
|
||
An example recent push that is missing these builds: https://tbpl.mozilla.org/?tree=Try&rev=20e3cbac0414&noignore=1
It's from Jan 7 (today) 7:05am PST. Maybe it would be useful to look at to see if its builds are getting delayed/batched or something.
I guess I should figure out how to get into the build VPN so I could dig further with the buildbot HTML UI.
| Reporter | ||
Comment 3•13 years ago
|
||
the database thinks its up-to-date right now
| Reporter | ||
Comment 4•13 years ago
|
||
ah, here's something...
mysql> select * from scheduler_changes where schedulerid=4715;
+-------------+----------+-----------+
| schedulerid | changeid | important |
+-------------+----------+-----------+
| 4715 | 2000424 | 0 |
| 4715 | 2000378 | 0 |
| 4715 | 2000338 | 0 |
| 4715 | 2000377 | 0 |
| 4715 | 2000663 | 0 |
| 4715 | 2000534 | 0 |
| 4715 | 2000484 | 0 |
| 4715 | 2000447 | 0 |
| 4715 | 2000296 | 0 |
+-------------+----------+-----------+
so the scheduler doesn't think these changes are important...
| Assignee | ||
Comment 5•13 years ago
|
||
I didn't see the last comment until well after this IRC snippet:
[14:07] catlee ok, how's this theory
[14:08] http://hg.mozilla.org/build/buildbotcustom/file/default/misc.py#l2506 looks for changes touching 'js' so we don't run root analysis on all pushes, just when we change js
this results is many changes being marked 'unimportant', esp on try
[14:09] the first 'important' change results in a build that contains all these changes
however on try we explicitly break this up into one build per change
[14:10] so instead of getting one build with an important change + other unimportant changes, we get one build for the 'important' change, and one for all previous 'unimportant' changes
But I think the above theory is correct, and the fix is to set onlyImportant=True when constructing all of the spidermonkey schedulers.
| Assignee | ||
Comment 6•13 years ago
|
||
Ugh. Except my dev environment says we're using buildbot 0.8.2, which doesn't have the onlyImportant option.
So we'd need a ChangeFilter with a filter_fn defined instead.
| Assignee | ||
Comment 7•13 years ago
|
||
Attachment #698887 -
Flags: review?(catlee)
| Assignee | ||
Updated•13 years ago
|
Assignee: nobody → sphink
| Reporter | ||
Comment 8•13 years ago
|
||
I'm pretty sure this is because the scheduler doesn't consider these changes important due to http://hg.mozilla.org/build/buildbotcustom/file/default/misc.py#l2506
So the scheduler is accumulating a bunch of unimportant changes, which then get triggered all at once when someone pushes a change that *does* touch js/src. And because we use treeStableTimer=None, we create a new build request for each change in the queue, which includes all the previous unimportant changes.
| Reporter | ||
Comment 9•13 years ago
|
||
Comment on attachment 698887 [details] [diff] [review]
Completely ignore non-JS changes instead of queueing them up
I think you need to import ChangeFilter, but looks good otherwise.
Attachment #698887 -
Flags: review?(catlee) → review+
| Assignee | ||
Updated•13 years ago
|
Attachment #698887 -
Flags: checked-in+
| Assignee | ||
Comment 10•13 years ago
|
||
http://hg.mozilla.org/build/buildbotcustom/rev/8d9926c1417c
Also had to move the branch argument into the ChangeFilter. Passes
test-masters.sh.
Comment 11•13 years ago
|
||
in production
| Reporter | ||
Comment 12•13 years ago
|
||
looks like this is fixed now
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Updated•12 years ago
|
Product: mozilla.org → Release Engineering
Updated•7 years ago
|
Component: General Automation → General
You need to log in
before you can comment on or make changes to this bug.
Description
•