Closed Bug 800435 Opened 12 years ago Closed 11 years ago

Unhide green Valgrind tbpl builds

Categories

(Tree Management Graveyard :: Visibility Requests, defect)

All
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: gkw, Unassigned)

References

Details

(by unhide I mean take them off the noignore section)
Done.  For future reference, you can click on Tree Info and then Adjust Hidden Builders and do it yourself.
Assignee: nobody → ehsan
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
(In reply to Ehsan Akhgari [:ehsan] from comment #2)
> Done.  For future reference, you can click on Tree Info and then Adjust
> Hidden Builders and do it yourself.

I don't have a sheriff password.

Please also unhide "Linux x86-64 mozilla-central valgrind".
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to Gary Kwong [:gkw, :nth10sd] from comment #3)
> (In reply to Ehsan Akhgari [:ehsan] from comment #2)
> > Done.  For future reference, you can click on Tree Info and then Adjust
> > Hidden Builders and do it yourself.
> 
> I don't have a sheriff password.

Lies!  ;-)

> Please also unhide "Linux x86-64 mozilla-central valgrind".

Done.
Status: REOPENED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → FIXED
Status: RESOLVED → VERIFIED
Um. A job which is not hidden on mozilla-central is a tier 1 job which may not be broken, and when it fails the cause must be immediately backed out.

Valgrind doesn't run on try, it doesn't run on inbound, it doesn't run on fx-team, it doesn't run on services-central, it doesn't run on-push, it is not tier-1.

Rehidden.
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
> Valgrind doesn't run on try, it doesn't run on inbound, it doesn't run on
> fx-team, it doesn't run on services-central, it doesn't run on-push, it is
> not tier-1.
> 
> Rehidden.

Valgrind builds can eventually be made to run on try and all the other branches, but it will take up a lot of resources to be run on-push, and we never planned for it to be run on-push.

Does this mean it can never be tier-1, and never be unhidden?
No, the other option would be to make it the first thing ever which is tier 1 despite running only once a day (desktop nightly builds don't really count, because they are 99% identical to jobs that run throughout the day, and the other 1% is absolutely essential even if it's miserable to have it broken by something within the last 24 hours).

Well, the first thing since the Netscape days - from what I hear, they used to close the tree to build nightly builds, manually test them, and then chase after people who committed during the previous day to figure out who had broken what, because they didn't have on-push tests to speak of. Seemed like a pretty miserable way to develop, to me.

But that would be fairly similar to what would need to happen with a 3am nightly-only tier 1 job - at 4am, edmorley would close mozilla-central, close mozilla-inbound since that would almost certainly be where the bustage had come from, and begin bisecting, all the while leaving mozilla-central and mozilla-inbound closed. Of course, it would be pointless to have him do the bisecting, since Valgrind is not his field of expertise, and it would be pointless to have mozilla-central and mozilla-inbound closed, since the reason you close a tree when it is busted is to prevent adding more undetectable bustage, but piling bustage on bustage is exactly what a once-a-day job is all about. So, you could make it the only visible thing which does not require an immediate backout or tree closure, and does not require the sheriff to figure out what busted it, at which point... why is it visible again?
So you're saying that a test can/should only be visible if it's on-push?
The way to deal with Valgrind issues could be to file a bug and add a suppression for the new bug, then (get Releng to) retrigger only Valgrind builds after the suppression is landed as a DONTBUILD.

Assuming it is the only problem found, the retriggered Valgrind build should be green again, and we'd have a bug on file to chase down the problem.

Just giving a bit of overview here.
(In reply to Nicholas Nethercote [:njn] from comment #8)
> So you're saying that a test can/should only be visible if it's on-push?

Up until now, pretty much yes (PGO is every 3/6 hours depending on tree; Nightly is unavoidable + 99% similar as philor said). They also need to run on Try + all trunk trees that merge into mozilla-central. 

I've thought for a while it would be useful to have this documented somewhere, since it comes up most of the times we add something new to TBPL that isn't running on all trees (last one was Marionette iirc). 

/me adds to pile of sheriffing things to update on the wiki.
Gary has a good point in comment 9 -- this case is a bit different because there's a mechanical process for getting the test green again while yielding a bug on file.  IIRC we run with --gen-suppressions=yes so Valgrind even spits out the necessary suppression.
I agree that it's more of a grey area than for other cases, but it still puts extra load on our (other than myself) volunteer sheriffs, who until now have not had to actively check things into mozilla-central on a daily basis just to maintain the green status-quo. 

That said, given that the Valgrind builds only run once a day & will presumably complete within my timezone, I guess perhaps it may not affect other sheriffs as much (other than weekends). I also don't have any idea how often it will turn red - if infrequently, then perhaps we are worrying over nothing (especially if it's just a case of copy-pasta-ing a snippet from the log & filing a short bug).

Philor, RyanVM, thoughts?
Is there some reason to unhide it?  If any time it goes orange, you are just going to push an autogenerated patch to ignore the failure, I'm not sure what the value is in showing it.
(In reply to Andrew McCreight [:mccr8] from comment #13)

What he said. If we're pushing changes to the suppression file to keep things green, then it seems that we're effectively ignoring it anyway. Seems like it would just end up causing confusion amongst people to have it showing. I agree that it's probably best left hidden by default unless we get to a point of running it at the same frequency as all of our other regression tests.
That said, maybe there's a compromise here. If enough machine resources could be allocated to run Valgrind builds on the main branches (m-c, m-i, m-a, m-b, m-r, m-esr10/17, s-c, fx-team, try maybe optionally), could we have an arrangement similar to PGO builds where they run as often as possible, but not necessarily on each push? That would at least narrow down a regression window to a shorter timeframe.
> Is there some reason to unhide it?

If it's hidden, will anyone notice if it stops being green?

Running the tests more often would be great, if possible.  How long do the tests take?

> I also don't have any idea how often it will turn red - if
> infrequently, then perhaps we are worrying over nothing

Indeed.  If it turns out to happen frequently, we could go to plan B.
> Running the tests more often would be great, if possible.  How long do the
> tests take?

They take < 1 hour now, but that's because they're only running PGO tests. The more tests we run, the longer it would take.

Julian has done tests on our mochitest suite w/ Valgrind and we think it takes about 11-12 hours.

> Indeed.  If it turns out to happen frequently, we could go to plan B.

fwiw, we had another green Valgrind build this morning. I'd say it depends on what the developers land to influence prevalence of redness.
Assignee: ehsan → nobody
> fwiw, we had another green Valgrind build this morning.

And we've had them for the five days since, AFAICT.  

(Well, https://tbpl.mozilla.org/?noignore=1&jobname=valgrind&rev=942ed5747b63 is red today but that looks like "abort: data/toolkit/Makefile.in.i@1b08914858da: no match found!" is the problem -- maybe an infrastructure problem?)
(In reply to Gary Kwong [:gkw, :nth10sd] from comment #17)
> Julian has done tests on our mochitest suite w/ Valgrind and we think it
> takes about 11-12 hours.

Roughly 10 CPU hours and 2.5GB real memory, for x86_64-linux built at
gcc -O2, running on a 3.47 GHz Core i5.

Being able to run Mochitests here would be awesome (for lack of a
better word :).  It routinely picks up new bugs on the once-per-month
basis that I've been running it by hand for a while.  It will take
some effort to get it green, but it'd be well worth the effort.
> (Well,
> https://tbpl.mozilla.org/?noignore=1&jobname=valgrind&rev=942ed5747b63 is
> red today but that looks like "abort:
> data/toolkit/Makefile.in.i@1b08914858da: no match found!" is the problem --
> maybe an infrastructure problem?)

The subsequent rebuilds are all green.
(In reply to comment #20)
> > (Well,
> > https://tbpl.mozilla.org/?noignore=1&jobname=valgrind&rev=942ed5747b63 is
> > red today but that looks like "abort:
> > data/toolkit/Makefile.in.i@1b08914858da: no match found!" is the problem --
> > maybe an infrastructure problem?)
> 
> The subsequent rebuilds are all green.

That's hg repository corruption.
(In reply to Nicholas Nethercote [:njn] from comment #18)
> (Well,
> https://tbpl.mozilla.org/?noignore=1&jobname=valgrind&rev=942ed5747b63 is
> red today but that looks like "abort:
> data/toolkit/Makefile.in.i@1b08914858da: no match found!" is the problem --
> maybe an infrastructure problem?)

That run should have retried, but didn't due to mock; filed bug 802114.
Depends on: 801955
Like any other build/testsuite, displaying by default is dependant on running per push (see bug 801955); which I don't see happening given the resource usage and comments from releng. Please reopen the bug if it does.

In a TBPLv2 world (TBPL rewrite, happening this year), Valgrind is the kind of job that could be made more visible if we implement team-specific view modes etc. In the meantime, you'll just need to use &noignore=1.
Status: REOPENED → RESOLVED
Closed: 12 years ago11 years ago
Resolution: --- → INCOMPLETE
No longer blocks: valgrind-on-tbpl
Depends on: valgrind-on-tbpl
Product: Webtools → Tree Management
Component: TBPL → Visibility Requests
Product: Tree Management → Tree Management Graveyard
You need to log in before you can comment on or make changes to this bug.