Closed
Bug 801955
Opened 11 years ago
Closed 9 years ago
Run Valgrind builds on all mozilla-central-based trees (inbound, try, fx-team, etc)
Categories
(Release Engineering :: General, defect, P3)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: gkw, Assigned: catlee)
References
Details
(Whiteboard: [valgrind])
Attachments
(2 files, 1 obsolete file)
1.52 KB,
patch
|
jhopkins
:
review+
catlee
:
checked-in+
|
Details | Diff | Splinter Review |
2.20 KB,
patch
|
nthomas
:
review+
catlee
:
checked-in+
|
Details | Diff | Splinter Review |
We should adapt Valgrind tbpl builds to run on other branches. Some that I can think of, off the top of my head, include: * mozilla-inbound * fx-team * services-central and possibly: * mozilla-aurora * mozilla-beta * mozilla-release (any future ESR branch)
Comment 1•11 years ago
|
||
I think we should stick to trunk for now - given that unless we backport fixes, the Valgrind runs on them are going to be red until everything works its way through the trains. At the minimum we should run them on: * mozilla-central * mozilla-inbound * fx-team * try ...but I suppose we could enable on all trunk repos if the load wasn't an issue. Also, to reduce load, we could enable them on Try, but not include in the default 'all' TryChooser group, meaning people have to request them specifically (if this is easily doable). Adjusting summary to mention Try (also, TBPL is just the dashboard, it's the buildbot scheduling that needs to be adjusted).
Blocks: 800435
Summary: Adapt Valgrind tbpl builds to run on other branches, e.g. mozilla-inbound, fx-team etc. → Run Valgrind builds on more trunk trees + Try
Comment 2•11 years ago
|
||
You mean you Need a way to exclude jobs from Trychooser's "all"? Bug 691177.
Comment 3•11 years ago
|
||
Ah yes that was the bug (I'm even CCed) :-)
![]() |
||
Comment 4•11 years ago
|
||
Enabling on mozilla-inbound seems like the most important of those cases in comment 1. As for enabling it on try... I'm trying to imagine the use case whereby someone would decide that their patch needs to enable the Valgrind testing. It wouldn't hurt to have, but I suspect it wouldn't get used much.
![]() |
Reporter | |
Comment 5•11 years ago
|
||
Also, bug 631838 is the bug to get Valgrind builds on Try.
Comment 6•11 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #4) > As for enabling it on try... I'm trying to imagine the use case > whereby someone would decide that their patch needs to enable the Valgrind > testing. It wouldn't hurt to have, but I suspect it wouldn't get used much. The use-case is "something in the last 24 hours turned Valgrind red, but we don't know what and have no way to run on Try". Running on Try is normally a prerequisite for unhiding things on TBPL (though as mentioned in bug 800435, the suppression file does make this slightly different; though without Try and thus a simple way to bisect, I imagine new suppressed failures will just end of being filed and ignored).
![]() |
Reporter | |
Comment 7•11 years ago
|
||
> I imagine new suppressed failures will just end of being filed and ignored). We track them in bug 793882.
Comment 8•11 years ago
|
||
(In reply to Gary Kwong [:gkw, :nth10sd] from comment #7) > > I imagine new suppressed failures will just end of being filed and ignored). > > We track them in bug 793882. Yeah I realise - and you've done an amazing job of getting everything filed and suppressed so we have Valgrind green - but I'm just wondering how sustainable it is for only a couple of people to have a working Valgrind environment to determine regression-ranges in the future? It just seems like running on Try would alleviate some of the load on you...
![]() |
Reporter | |
Comment 9•11 years ago
|
||
> Yeah I realise - and you've done an amazing job of getting everything filed > and suppressed so we have Valgrind green Thank you! I wouldn't have been able to do it without help from you and some of the other folks around here though. It's a team effort. > but I'm just wondering how > sustainable it is for only a couple of people to have a working Valgrind > environment to determine regression-ranges in the future? It just seems like > running on Try would alleviate some of the load on you... Yes, I agree. Now if only someone could make some magic and help us put Valgrind on Try.. :)
![]() |
||
Comment 10•11 years ago
|
||
> The use-case is "something in the last 24 hours turned Valgrind red, but we
> don't know what and have no way to run on Try".
>
> Running on Try is normally a prerequisite for unhiding things on TBPL
Gotcha. Thanks!
![]() |
Reporter | |
Comment 11•11 years ago
|
||
Clarifying summary. This bug is the major blocker that blocks unhiding a green tree on TBPL.
See Also: → 631838
Summary: Run Valgrind builds on more trunk trees + Try → Run Valgrind builds on mozilla-inbound, fx-team and Try
Assignee | ||
Comment 12•11 years ago
|
||
How about we run valgrind per-push on these branches?
![]() |
Reporter | |
Comment 13•11 years ago
|
||
(In reply to Chris AtLee [:catlee] from comment #12) > How about we run valgrind per-push on these branches? If Releng is fine with the load, I'm fine with it. :) The concern in the past was that it would take up too much resources, but now that they're on AWS we might be able to scale up easily. Adjusting summary.
Summary: Run Valgrind builds on mozilla-inbound, fx-team and Try → Run Valgrind builds on mozilla-inbound, fx-team and Try, per push
Assignee | ||
Updated•11 years ago
|
Priority: -- → P3
Whiteboard: [valgrind]
![]() |
Reporter | |
Comment 14•11 years ago
|
||
Getting Valgrind green on tbpl is a Q4 goal for us, and this bug is a major blocker to that. Chris, may I know what else needs to be done to help move this bug forward?
Flags: needinfo?(catlee)
Assignee | ||
Comment 15•11 years ago
|
||
(In reply to Gary Kwong [:gkw, :nth10sd] from comment #14) > Getting Valgrind green on tbpl is a Q4 goal for us, and this bug is a major > blocker to that. > > Chris, may I know what else needs to be done to help move this bug forward? There's certainly a cost associated with running these per-checkin. Running both linux-32 and linux-64 results in about 2 hours of compute time per push. Is it worth it? Are we catching good bugs with valgrind? Can we choose a different schedule (e.g. every 4 hours) that keeps it visible on the tree, but doesn't use as many resources?
Flags: needinfo?(catlee)
![]() |
Reporter | |
Comment 16•11 years ago
|
||
> Is it worth it? Are we catching good bugs with valgrind? Yes, we've found bug 795395, bug 795631 and bug 803386 - all initially marked security-sensitive (some still are!) > There's certainly a cost associated with running these per-checkin. Running > both linux-32 and linux-64 results in about 2 hours of compute time per push. This will benefit the sheriffs as they can be clear as to which checkin caused the error. > Can we choose a different schedule (e.g. every 4 hours) that keeps it > visible on the tree, but doesn't use as many resources? I'll let the sheriffs chime in on this. Ed, maybe you have some ideas?
Flags: needinfo?(bmo)
Comment 17•11 years ago
|
||
That depends on whether we've done a complete 180 degree turn from the plan as laid out in bug 800435, which was that whenever it failed, we would just file a bug and push a suppression. If we have, and it's now to be something which may not ever be made to fail and a push which causes it to fail must be backed out, then we want it running on push on every push on every tree. If not, then we're perfectly happy with nightly-only on mozilla-central (well, unless we're also expected to determine who is at fault, along with filing the bug and pushing the suppression), and the only question is whether or not it has to be made visible in order to shift the burden of having to file the bug and push the suppression onto edmorley except when he's gone when we'd push it onto me or RyanVM.
![]() |
||
Comment 18•11 years ago
|
||
> That depends on whether we've done a complete 180 degree turn from the plan
> as laid out in bug 800435, which was that whenever it failed, we would just
> file a bug and push a suppression.
I think file-a-bug-and-push-a-suppression is probably the better way to go, for the following reasons.
- Valgrind is resource-hungry even on the short test runs that we currently have enabled. We have bugs open to run it on much larger test suites, for which running on every push seems like a bad idea.
- I suspect we'll get failures only moderately often.
- A nice thing about Valgrind is that when it does complain it's usually not that hard to work out the cause -- i.e. not having the exact changeset that caused the problem probably isn't dire.
Please speak up if you disagree with these reasons!
![]() |
Reporter | |
Comment 19•11 years ago
|
||
It's nice to almost have a consensus. I agree with philor and njn. If edmorley is fine with the plan to have Valgrind running on more branches once a day but not per push, and to file a bug and add a suppression each time one shows up, I'd say let's go with it. (Ed, please change the summary to remove "per push" as well)
Comment 20•11 years ago
|
||
Going by comments 17-19, I think we should: 1) Just run once per day, but possibly on more trees (though how likely are say changes to browser/ in fx-team likely to cause Valgrind regressions?). 2) Unhide to start with, until we get an idea as to how often it's going to break. 3) Once we have the answer to #2, make a final call as to whether it stays unhidden, given the extra burden on (majority volunteer) sheriffs & the fact that people like the jetpack team manage to maintain their hidden builds pretty well. (And also the possible slippery slope of off-loading team-specific tree tasks onto sheriffs).
Flags: needinfo?(bmo)
Comment 21•11 years ago
|
||
You forgot the most important slippery slope: right now, there are exactly zero visible builds which you can ignore and just push on top of unstarred unfiled undealt-with red or orange. Because we are not doing anything about these failures other than hiding them via the suppression file, there is absolutely no point in telling someone who has a patch to push which is blocking a beta but needs to land on the trunk first that they have to learn the (knowing us, undocumented, or documented where nobody knows how to find it) method for hiding this orange, so this will be the first visible build for which it's permissible to push on unstarred failure. And, why make it visible? What is there to be gained by that, in exchange for even more devaluation of the already deeply devalued tree rules? (And, followup, why are we discussing this in an ever-increasing number of different bugs, rather than a newsgroup?)
![]() |
Reporter | |
Comment 22•11 years ago
|
||
I happened to chance on another failure - suppressed at https://hg.mozilla.org/mozilla-central/rev/a50a77da3cf9 Let's aim to run on multiple branches and on Try first, if per-push is not recommended, we can not do that now, and when circumstances change in the future, we can revisit that decision another day.
Summary: Run Valgrind builds on mozilla-inbound, fx-team and Try, per push → Run Valgrind builds on mozilla-inbound, fx-team and Try
Updated•10 years ago
|
Product: mozilla.org → Release Engineering
![]() |
||
Updated•9 years ago
|
Assignee: nobody → catlee
![]() |
||
Comment 24•9 years ago
|
||
catlee volunteered to do this. As for exactly which trees need it, https://wiki.mozilla.org/Sheriffing/Job_Visibility_Policy#3.29_Runs_on_mozilla-central_and_all_trees_that_merge_into_it says: > 3) Runs on mozilla-central and all trees that merge into it > > Otherwise job failures when tree X merges into mozilla-central will not > be attributable to a single changeset, resulting in either tree closure > or backout of the entire merge (see requirement #2). > > When filing the release engineering bug to enable your job on all the > required trees, ask to enable it on "mozilla-central based trees" and > release engineering will enable it in the default config from which all > trunk trees inherit (unless the various tree owners have explicitly opted > out). As a rough guide, mozilla-central based trees include > mozilla-inbound, fx-team, services-central, ionmonkey, graphics as well > as many of the other project/disposable repositories.
![]() |
||
Updated•9 years ago
|
Summary: Run Valgrind builds on mozilla-inbound, fx-team and Try → Run Valgrind builds on all mozilla-central-based trees (inbound, try, fx-team, etc)
Assignee | ||
Comment 25•9 years ago
|
||
this changes the default on project branches to True. All the release branches are set as False already, so this patch ends up enabling valgrind builds on all non-release branches.
Attachment #8343717 -
Flags: review?(jhopkins)
Updated•9 years ago
|
Attachment #8343717 -
Flags: review?(jhopkins) → review+
Assignee | ||
Comment 26•9 years ago
|
||
Comment on attachment 8343717 [details] [diff] [review] enable valgrind builds everywhere https://hg.mozilla.org/build/buildbot-configs/rev/dcc6288a8d29
Attachment #8343717 -
Flags: checked-in+
Comment 27•9 years ago
|
||
In production
![]() |
||
Comment 28•9 years ago
|
||
It's working on m-c, but not on m-i, for example: https://tbpl.mozilla.org/?tree=Mozilla-Inbound&showall=1&jobname=valgrind There are multiple of these in the log: abort: unknown revision '6ecda297fb371a680efcd49e4a90e8164c561498'! philor suggested it's because m-c is being cloned?
Assignee | ||
Comment 29•9 years ago
|
||
Yes, m-c is being cloned because of http://hg.mozilla.org/build/tools/file/d21da1c08036/scripts/valgrind/valgrind.sh#l7 I'm working on a patch
Assignee | ||
Comment 30•9 years ago
|
||
Attachment #8344843 -
Flags: review?(nthomas)
Comment 31•9 years ago
|
||
Comment on attachment 8344843 [details] [diff] [review] Use build properties to determine which repo to clone r+ on the later version from IRC (https://diff.pastebin.mozilla.org/3744166) which >diff --git a/scripts/valgrind/valgrind.sh b/scripts/valgrind/valgrind.sh >+ wget -O branches.json $BRANCHES_JSON swaps this to the copy of tools we're running out of: BRANCHES_JSON=$SCRIPTS_DIR/buildfarm/maintenance/production-branches.json HG_REPO=$($JSONTOOL -k ${branch}.repo branches.json) and no wget. >+ echo python $SCRIPTS_DIR/clobberer/clobberer.py -s scripts -s $(basename $PROPERTIES_FILE) \ >+ $CLOBBERER_URL $branch "$builder" $builddir $slavename $master) And removed this echo.
Attachment #8344843 -
Flags: review?(nthomas) → review+
Assignee | ||
Comment 32•9 years ago
|
||
Use the production-branches.json we already have! Also, remove extra echos.
Attachment #8344843 -
Attachment is obsolete: true
Attachment #8344865 -
Flags: review?(nthomas)
Updated•9 years ago
|
Attachment #8344865 -
Flags: review?(nthomas) → review+
Assignee | ||
Updated•9 years ago
|
Attachment #8344865 -
Flags: checked-in+
![]() |
||
Comment 33•9 years ago
|
||
Appears to be working -- Valgrind is running successfully (well, failing, but not due to buildbot config issues) on mozilla-inbound and fx-team. If I do a |-p all| push to try, will Valgrind run there?
![]() |
||
Updated•9 years ago
|
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Updated•5 years ago
|
Component: General Automation → General
You need to log in
before you can comment on or make changes to this bug.
Description
•