usebuildbot=1 doesn't show Valgrind builds

RESOLVED FIXED

Status

Release Engineering
General
P2
normal
RESOLVED FIXED
7 years ago
5 years ago

People

(Reporter: philor, Assigned: coop)

Tracking

({regression})

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [project])

Attachments

(4 attachments, 3 obsolete attachments)

This is caused by bug 669137.
Depends on: 669137
(Reporter)

Comment 2

7 years ago
It is? The slowest of them took 3 hours according to tinderbox (which lies), most were an hour or an hour and a half.
Okay, then maybe it's not.
(Reporter)

Comment 4

7 years ago
Pretty much everything about the Valgrind builds is a bit odd, from the scheduling to the way they run, so I'm ready to believe there's something missing in what's in the json (or that they don't make it into the json at all - I don't think I've ever seen them show as running, so they may not make it into that json either), but I don't know what we require or what's required to wind up in either json file, or how to catch the json when they are either running or freshly finished, since they just run when they feel like it.
There's a patch on bug 678685 which should set the branch properly for Valgrind.
Depends on: 678685
That patch is now landed, and show get deployed Tuesday morning.
(Reporter)

Updated

7 years ago
Keywords: regression
(Reporter)

Updated

7 years ago
Summary: tbpl.allizom.org/?usebuildbot=1 doesn't show Valgrind builds → usebuildbot=1 doesn't show Valgrind builds
Was actually deployed today. Please reopen if there are further issues.
(Reporter)

Comment 8

7 years ago
You didn't close it, so I can't reopen it, but it still doesn't show, so perhaps it had some other problem besides the branch.
(Reporter)

Updated

7 years ago
Blocks: 630538
No longer blocks: 669000
If this is still happening, could this be caused by bug#685299 ?
That actually means we don't get any logs for debugging, but we should still be getting them on TBPL (like SpiderMonkey now).

I found a couple of issues
1, The builders in the js.gz files don't have branch set to mozilla-central instead of idle, so this is the same as the end of bug 684158. I've fixed that manually to speed it up

2, We have |"revision": null| in the properties for the builds, so tbpl doesn't know what to display the build against. It does have the branch set properly. Hopefully setting the revision property will fix both the running and finished cases.
Might as well have this in RelEng until we've worked all the kinks out on our side.

The Valgrind builds aren't on-change (they're on a 24 hours timer with a 3 idle slaves requirement) so buildbot never sets that revision property when creating the job. Since the script does the checkout, build, and valgrind all in one go we don't get a chance to run a SetProperty step until afterwards. The build wouldn't show as running on tbpl, instead appearing as finished. Alternatively, we could make Valgrind another nightly build, just one that runs after all the normal jobs finish. That'd mix up the project/non-project distinction we're breaking right now. What say you RelEng ?
Component: Tinderboxpushlog → Release Engineering
Product: Webtools → mozilla.org
QA Contact: tinderboxpushlog → release
Version: Trunk → other
(Reporter)

Comment 12

7 years ago
That might or might not be a step in the direction it really wants to go, dunno. If we could ever get it run on try, and get it greened up, what it actually wants to be is a tier-1, break it and you back out, on-change build.
(Reporter)

Comment 13

7 years ago
Despite still not seeing them show up in the display, I noticed while looking for something else in the tree admin panel that valgrind-linux and valgrind-linux64 now show up there, so I hid them (they'll be permared if they ever do show up), so now to know whether or not they're showing up you'll need to use https://tbpl.mozilla.org/?noignore=1&jobname=valgrind (and starting with a &usetinderbox=1 tacked on there, so you know where you should be looking to see if they are there, helps).
(In reply to Nick Thomas [:nthomas] (gone until Sep. 25) from comment #11)
> Might as well have this in RelEng until we've worked all the kinks out on
> our side.
> 
> The Valgrind builds aren't on-change (they're on a 24 hours timer with a 3
> idle slaves requirement) so buildbot never sets that revision property when
> creating the job. Since the script does the checkout, build, and valgrind
> all in one go we don't get a chance to run a SetProperty step until
> afterwards. The build wouldn't show as running on tbpl, instead appearing as
> finished. Alternatively, we could make Valgrind another nightly build, just
> one that runs after all the normal jobs finish. That'd mix up the
> project/non-project distinction we're breaking right now. What say you
> RelEng ?

I don't think it's necessarily a bad thing for the Valgrind builds to be run off of a Nightly scheduler, but I don't know that that'll fix the revision problem immediately. From my reading of the Nightly Scheduler, revision will still be None. So, we probably need some way of upleveling the changeset that the Valgrind script ends up with. Looking for '^revision: [0-9abcdef]+' seems like a generic enough thing for ScriptFactory support, but I'm curious if Catlee has any thoughts on this.
(In reply to Ben Hearsum [:bhearsum] from comment #14)
> (In reply to Nick Thomas [:nthomas] (gone until Sep. 25) from comment #11)
> > Might as well have this in RelEng until we've worked all the kinks out on
> > our side.
> > 
> > The Valgrind builds aren't on-change (they're on a 24 hours timer with a 3
> > idle slaves requirement) so buildbot never sets that revision property when
> > creating the job. Since the script does the checkout, build, and valgrind
> > all in one go we don't get a chance to run a SetProperty step until
> > afterwards. The build wouldn't show as running on tbpl, instead appearing as
> > finished. Alternatively, we could make Valgrind another nightly build, just
> > one that runs after all the normal jobs finish. That'd mix up the
> > project/non-project distinction we're breaking right now. What say you
> > RelEng ?
> 
> I don't think it's necessarily a bad thing for the Valgrind builds to be run
> off of a Nightly scheduler, but I don't know that that'll fix the revision
> problem immediately. From my reading of the Nightly Scheduler, revision will
> still be None.

If we can find a way to get revision set, however, this is probably the cleanest way.
Whiteboard: [triage-followup]
(Assignee)

Updated

7 years ago
Priority: -- → P3
(Assignee)

Comment 16

7 years ago
(In reply to Ben Hearsum [:bhearsum] from comment #14)
> I don't think it's necessarily a bad thing for the Valgrind builds to be run
> off of a Nightly scheduler, but I don't know that that'll fix the revision
> problem immediately. From my reading of the Nightly Scheduler, revision will
> still be None. So, we probably need some way of upleveling the changeset
> that the Valgrind script ends up with. Looking for '^revision: [0-9abcdef]+'
> seems like a generic enough thing for ScriptFactory support, but I'm curious
> if Catlee has any thoughts on this.

Why uplevel when we can downlevel, i.e. could we make the valgrind build triggerable after a successful nightly, and pass the nightly changeset to the valgrind script?
tbpl.allizom.org has gone away, but the root problem is still the same. Here are refreshed URLs that reproduce the problem:

https://tbpl.mozilla.org/?noignore=1&jobname=valgrind&usetinderbox=1 does show valgrind builds.

https://tbpl.mozilla.org/?noignore=1&jobname=valgrind does not show valgrind builds.
(In reply to Chris Cooper [:coop] from comment #16)
> (In reply to Ben Hearsum [:bhearsum] from comment #14)
> > I don't think it's necessarily a bad thing for the Valgrind builds to be run
> > off of a Nightly scheduler, but I don't know that that'll fix the revision
> > problem immediately. From my reading of the Nightly Scheduler, revision will
> > still be None. So, we probably need some way of upleveling the changeset
> > that the Valgrind script ends up with. Looking for '^revision: [0-9abcdef]+'
> > seems like a generic enough thing for ScriptFactory support, but I'm curious
> > if Catlee has any thoughts on this.
> 
> Why uplevel when we can downlevel, i.e. could we make the valgrind build
> triggerable after a successful nightly, and pass the nightly changeset to
> the valgrind script?

These jobs do their own builds, so I don't think it makes sense to have them depend on nightly jobs...
(Assignee)

Comment 19

7 years ago
(In reply to Ben Hearsum [:bhearsum] from comment #18) 
> These jobs do their own builds, so I don't think it makes sense to have them
> depend on nightly jobs...

OK, could we just pass them the same changeset that we pass to the other nightlies and just make them another nightly build?
(In reply to Chris Cooper [:coop] from comment #19)
> (In reply to Ben Hearsum [:bhearsum] from comment #18) 
> > These jobs do their own builds, so I don't think it makes sense to have them
> > depend on nightly jobs...
> 
> OK, could we just pass them the same changeset that we pass to the other
> nightlies and just make them another nightly build?

That's what I was describing in #14, but because we don't use any of the Buildbot Source steps, I'm not sure revision will be set. Might be worth a try, though.
(Assignee)

Comment 21

7 years ago
I mucked with this a bit on Friday, and was able to get Valgrind running as a nightly build and it *did* get the revision set. Now I just have to alter how the script factory is called to pass the revision.
Assignee: nobody → coop
Status: NEW → ASSIGNED
Priority: P3 → P2
Whiteboard: [triage-followup] → [project]
(Assignee)

Comment 22

6 years ago
Created attachment 566898 [details] [diff] [review]
Allow valgrind.sh to grab a REVISION from the env
Attachment #566898 - Flags: review?(bhearsum)
(Assignee)

Comment 23

6 years ago
Created attachment 566901 [details] [diff] [review]
Change valgrind to a nightly builder

I've subclassed ScriptFactory as ValgrindFactory (felt kinda dirty about that), but it was the easiest way to get the revisions into buildbot. It's shorter than generateValgrindObjects which it replaces, though.
Attachment #566901 - Flags: review?(bhearsum)
(Assignee)

Comment 24

6 years ago
Created attachment 566903 [details] [diff] [review]
Config changes required to run valgrind as a nightly builder
Attachment #566903 - Flags: review?(bhearsum)
Comment on attachment 566901 [details] [diff] [review]
Change valgrind to a nightly builder

>diff --git a/misc.py b/misc.py
>+            if config['enable_valgrind'] and \
>+               platform in config['valgrind_platforms']:
>+                valgrind_env=platform_env
>+                valgrind_env['REVISION'] = WithProperties("%(revision)s")

Drive-by comment - we normally copy envs so that other builders don't get modified.
Comment on attachment 566901 [details] [diff] [review]
Change valgrind to a nightly builder

Review of attachment 566901 [details] [diff] [review]:
-----------------------------------------------------------------

I'm going back and forth on this approach...on one hand, I feel like it's a step backwards to change this lightweight project into something attached to the monolithic set of branch configs. On the other hand, this _is_ a Firefox job, so maybe it belongs there? I'm curious what Catlee thinks about this, as he set these jobs up in the first place.

Aside from that, I'm not sure why Valgrind needs its own factory. The only thing it does is override got_revision and revision. However, now that you're triggering Valgrind jobs through a Scheduler which sets revision, I don't think there's any need for that.
Attachment #566901 - Flags: review?(catlee)
Attachment #566901 - Flags: review?(bhearsum)
Attachment #566901 - Flags: review-
Attachment #566898 - Flags: review?(bhearsum) → review+
Comment on attachment 566903 [details] [diff] [review]
Config changes required to run valgrind as a nightly builder

Review of attachment 566903 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozilla/config.py
@@ +86,5 @@
>      'enable_blocklist_update': False,
>      'blocklist_update_on_closed_tree': False,
>      'enable_nightly': True,
>      'enabled_products': ['firefox', 'mobile'],
> +    'enable_valgrind': False,

Can you make the default to this True, if you end up sticking with this approach? ISTR that we try to keep mozilla-central's config the same as GLOBAL_VARS.
Attachment #566903 - Flags: review?(bhearsum) → review-
(Assignee)

Comment 28

6 years ago
(In reply to Ben Hearsum [:bhearsum] from comment #26) 
> Aside from that, I'm not sure why Valgrind needs its own factory. The only
> thing it does is override got_revision and revision. However, now that
> you're triggering Valgrind jobs through a Scheduler which sets revision, I
> don't think there's any need for that.

I did this because if the ScriptFactory is ever force-built without setting the revision, the revision won't bubble up. Maybe we don't care about that?

I can tack the SetProperty step onto the existing ScriptFactory call in misc.py instead of subclassing - is that acceptable?
(Assignee)

Comment 29

6 years ago
(In reply to Ben Hearsum [:bhearsum] from comment #27) 
> Can you make the default to this True, if you end up sticking with this
> approach? ISTR that we try to keep mozilla-central's config the same as
> GLOBAL_VARS.

Really? So an entry per branch instead of just one for m-c? That seems inefficient unless we expect it to run most places, but I'll bow to best practice here.
(In reply to Chris Cooper [:coop] from comment #29)
> (In reply to Ben Hearsum [:bhearsum] from comment #27) 
> > Can you make the default to this True, if you end up sticking with this
> > approach? ISTR that we try to keep mozilla-central's config the same as
> > GLOBAL_VARS.
> 
> Really? So an entry per branch instead of just one for m-c? That seems
> inefficient unless we expect it to run most places, but I'll bow to best
> practice here.

That's the way I've thought of it for the last few years. It's the config used for both mozilla-central and for project branches, so it makes a lot of sense IMHO.
(Assignee)

Comment 31

6 years ago
Created attachment 567116 [details] [diff] [review]
Add generic property-setting steps to ScriptFactory
Attachment #566901 - Attachment is obsolete: true
Attachment #566901 - Flags: review?(catlee)
Attachment #567116 - Flags: review?(bhearsum)
(Assignee)

Comment 32

6 years ago
Created attachment 567118 [details] [diff] [review]
Config changes required to run valgrind as a nightly builder, v2

Makes the default True for enable_valgrind on m-c, other branches must pref it off, as requested.
Attachment #566903 - Attachment is obsolete: true
Attachment #567118 - Flags: review?(bhearsum)
(Assignee)

Comment 33

6 years ago
Created attachment 567119 [details] [diff] [review]
Put revision into files in properties dir for consumption by buildbot
Attachment #566898 - Attachment is obsolete: true
Attachment #567119 - Flags: review?(bhearsum)
Attachment #567118 - Flags: review?(bhearsum) → review+
Attachment #567116 - Flags: review?(bhearsum) → review+
Attachment #567119 - Flags: review?(bhearsum) → review+
(Assignee)

Comment 34

6 years ago
Comment on attachment 567119 [details] [diff] [review]
Put revision into files in properties dir for consumption by buildbot

https://hg.mozilla.org/build/tools/rev/95eb6eb0311a
Attachment #567119 - Flags: checked-in+
(Assignee)

Comment 35

6 years ago
Comment on attachment 567116 [details] [diff] [review]
Add generic property-setting steps to ScriptFactory

https://hg.mozilla.org/build/buildbotcustom/rev/4b2cc709a159
Attachment #567116 - Flags: checked-in+
(Assignee)

Comment 36

6 years ago
Comment on attachment 567118 [details] [diff] [review]
Config changes required to run valgrind as a nightly builder, v2

https://hg.mozilla.org/build/buildbot-configs/rev/f93cf77775d3
Attachment #567118 - Flags: checked-in+
(Assignee)

Updated

6 years ago
Flags: needs-reconfig?
Comment on attachment 567119 [details] [diff] [review]
Put revision into files in properties dir for consumption by buildbot

This |cat| step failed on the SeaMonkey Blocklist Update Script Factory. This was the last step run. This as-is is suboptimal, as I suspect it may cause other ScriptFactory things to fail.

bash -c 'cat *'
 in dir /builds/slave/comm-cen-trunk-lnx-blocklistupdate/properties (timeout 1200 secs)
 watching logfiles {}
 argv: ['bash', '-c', 'cat *']
 environment:
  CC=/tools/gcc/bin/gcc
  CVS_RSH=ssh
  CXX=/tools/gcc/bin/g++
  G_BROKEN_FILENAMES=1
  HISTSIZE=1000
  HOME=/home/seabld
  HOSTNAME=cn-sea-qm-centos5-01
  INPUTRC=/etc/inputrc
  JAVA_HOME=/builds/jdk
  LANG=en_US.UTF-8
  LESSOPEN=|/usr/bin/lesspipe.sh %s
  LOGNAME=seabld
  LS_COLORS=no=00:fi=00:di=01;34:ln=01;36:pi=40;33:so=01;35:bd=40;33;01:cd=40;33;01:or=01;05;37;41:mi=01;05;37;41:ex=01;32:*.cmd=01;32:*.exe=01;32:*.com=01;32:*.btm=01;32:*.bat=01;32:*.sh=01;32:*.csh=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.gz=01;31:*.bz2=01;31:*.bz=01;31:*.tz=01;31:*.rpm=01;31:*.cpio=01;31:*.jpg=01;35:*.gif=01;35:*.bmp=01;35:*.xbm=01;35:*.xpm=01;35:*.png=01;35:*.tif=01;35:
  MAIL=/var/spool/mail/seabld
  PATH=/opt/local/bin:/tools/python/bin:/tools/buildbot/bin:/tools/python/bin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/home/seabld/bin
  PWD=/builds/slave/comm-cen-trunk-lnx-blocklistupdate/properties
  SHELL=/bin/bash
  SHLVL=1
  SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh-askpass
  TBOX_CLIENT_CVS_DIR=/builds/tinderbox/mozilla/tools
  TERM=linux
  USER=seabld
  _=/tools/python/bin/python
 using PTY: False
cat: *: No such file or directory
program finished with exit code 1
elapsedTime=0.077063
Attachment #567119 - Flags: feedback-
(Assignee)

Comment 38

6 years ago
(In reply to Justin Wood (:Callek) from comment #37)
> This |cat| step failed on the SeaMonkey Blocklist Update Script Factory.
> This was the last step run. This as-is is suboptimal, as I suspect it may
> cause other ScriptFactory things to fail.

Would something like "for file in `ls -1`; do cat $file; done" be acceptable?
Preproduction release failed during tagging:
======
bash -c 'cat *'
 in dir /builds/slave/rel-m-beta-firefox-tag/properties (timeout 1200 secs)
 watching logfiles {}
 argv: ['bash', '-c', 'cat *']
......
cat: *: No such file or directory
program finished with exit code 1
(In reply to Chris Cooper [:coop] from comment #38)
> (In reply to Justin Wood (:Callek) from comment #37)
> > This |cat| step failed on the SeaMonkey Blocklist Update Script Factory.
> > This was the last step run. This as-is is suboptimal, as I suspect it may
> > cause other ScriptFactory things to fail.
> 
> Would something like "for file in `ls -1`; do cat $file; done" be acceptable?

That could be fine, but I wonder how buildbot behaves on a SetProperty like that with an empty stdout. Better still, imo, would be |flunkOnFailure=False|
(In reply to Justin Wood (:Callek) from comment #40)
> (In reply to Chris Cooper [:coop] from comment #38)
> > (In reply to Justin Wood (:Callek) from comment #37)
> > > This |cat| step failed on the SeaMonkey Blocklist Update Script Factory.
> > > This was the last step run. This as-is is suboptimal, as I suspect it may
> > > cause other ScriptFactory things to fail.
> > 
> > Would something like "for file in `ls -1`; do cat $file; done" be acceptable?
> 
> That could be fine, but I wonder how buildbot behaves on a SetProperty like
> that with an empty stdout.

We should probably find this out rathen than guessing.

> Better still, imo, would be |flunkOnFailure=False|

I think we should do this, too, but I'd still prefer not to have a step that always fails for some jobs.
(Assignee)

Comment 42

6 years ago
Created attachment 567458 [details] [diff] [review]
Iterate over files to make sure they exist

(In reply to Ben Hearsum [:bhearsum] from comment #41)
> We should probably find this out rathen than guessing.

Tested in dev env. If there are no files, and hence, no output, no properties are set. Likewise, if the files containing the properties are not in the correct format, the regexp doesn't match anything so no properties are set.

> > Better still, imo, would be |flunkOnFailure=False|

I've added flunkOnFailure/warnOnFailure now, but honestly we shouldn't hit those cases now that we're iterating over files rather than doing a blanket 'cat.'
Attachment #567458 - Flags: review?(bear)

Updated

6 years ago
Attachment #567458 - Flags: review?(bear) → review+
(Assignee)

Comment 43

6 years ago
Comment on attachment 567458 [details] [diff] [review]
Iterate over files to make sure they exist

https://hg.mozilla.org/build/buildbotcustom/rev/2d8e38884aa5
Attachment #567458 - Flags: checked-in+
(Assignee)

Comment 44

6 years ago
Valgrind results appeared last night:

https://tbpl.mozilla.org/?rev=0a5e72d1b479

The builds are currently broken, but devs can see the builds and logs now and can hopefully get them greened up.
Status: ASSIGNED → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
(Assignee)

Updated

6 years ago
Flags: needs-reconfig?
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.