Closed
Bug 831491
(asan-tests)
Opened 13 years ago
Closed 12 years ago
run tests on ASAN builds
Categories
(Release Engineering :: General, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: joduinn, Unassigned)
References
Details
(Keywords: sec-want)
Attachments
(8 files)
|
11.50 KB,
patch
|
rail
:
review+
catlee
:
checked-in+
|
Details | Diff | Splinter Review |
|
1.58 KB,
patch
|
rail
:
review+
catlee
:
checked-in+
|
Details | Diff | Splinter Review |
|
830 bytes,
patch
|
rail
:
review+
catlee
:
checked-in+
|
Details | Diff | Splinter Review |
|
3.58 KB,
patch
|
rail
:
review+
catlee
:
checked-in+
|
Details | Diff | Splinter Review |
|
1.34 KB,
patch
|
rail
:
review+
catlee
:
checked-in+
|
Details | Diff | Splinter Review |
|
2.05 KB,
patch
|
rail
:
review+
catlee
:
checked-in+
|
Details | Diff | Splinter Review |
|
2.45 KB,
patch
|
rail
:
review+
catlee
:
checked-in+
|
Details | Diff | Splinter Review |
|
1.28 KB,
patch
|
catlee
:
review+
rail
:
checked-in+
|
Details | Diff | Splinter Review |
Spinning out from bug#753148.
This bug is to track figuring out *which* testsuites to run, on *which* ASAN builds, on *which* branches. (I think I remember :decoder saying tests on opt nightly builds only for now and then revisit later, but that was a few hours ago, so asking here to verify). Once we have this list of testsuites, please reassign back.
It would be helpful to have idea of duration and frequency of running testsuites, on ASAN builds, to help us with capacity planning.
To set expectations, note that currently our test machines are very overloaded, so enabling additional testload needs careful evaluation until we get more machines, or can offload other existing test suites.
Comment 1•13 years ago
|
||
As discussed in the meeting, if there are resource problems with running the tests, we would be very happy if we could at least run the tests on optimized builds (they're a lot faster and therefore will consume less resources) and only once a day.
Furthermore, we don't need to run all tests. Currently we run the following set:
reftest,crashtest,xpcshell,jsreftest,mochitests
Let me know if anything else is required :)
Updated•13 years ago
|
Comment 2•13 years ago
|
||
builds: Opt Linux64 on mozilla-central branch
Tests: reftest, crashtest, xpcshell, jsreftest, mochitests (from comment 1)
Comment 3•13 years ago
|
||
Assigning to John to see what the next steps are to make this happen. The tests run fine manually. There are two known broken tests (see "depends on" list) that we can disable if necessary to get this landed, but of course fixing the tests would be better.
Assignee: choller → joduinn
Comment 4•13 years ago
|
||
There are some test failures right now but no blocking issues. The same set of tests runs already daily with a scheduled try push which can then be disabled.
Comment 5•13 years ago
|
||
As discussed, here are the times that the different tests take (extracted from a try run):
mochitest-1: elapsed: 17 mins, 28 secs
mochitest-2: elapsed: 19 mins, 10 secs
mochitest-3: elapsed: 37 mins, 51 secs
mochitest-4: elapsed: 12 mins, 40 secs
mochitest-5: elapsed: 20 mins, 23 secs
mochitest-o: elapsed: 19 mins, 5 secs
mochitest-bc: elapsed: 1 hrs, 6 mins, 39 secs
crashtest: elapsed: 13 mins, 10 secs
reftest: elapsed: 34 mins, 21 secs
jsreftest: elapsed: 21 mins, 28 secs
xpcshell: elapsed: 57 mins, 49 secs
(In reply to Christian Holler (:decoder) from comment #5)
> As discussed, here are the times that the different tests take (extracted
> from a try run):
Ok, I'm comparing these to end-to-end runs of the same jobs on a Ubuntu 64 opt build off of mozilla central. So, this is the timing for the entire job (including setup and teardown) because that is what matters when it comes to capacity planning - i.e. how quickly can we start and finish a job.
>
>
> mochitest-1: elapsed: 17 mins, 28 secs
mochitest-1 on Ubuntu 64 is usually around 24 minutes. Not sure how on earth you're faster on an ASAN build than that.
> mochitest-2: elapsed: 19 mins, 10 secs
This is usually 9 minutes
> mochitest-3: elapsed: 37 mins, 51 secs
This is usually 23 minutes
> mochitest-4: elapsed: 12 mins, 40 secs
Usually 6 minutes
> mochitest-5: elapsed: 20 mins, 23 secs
Usually 11 minutes
> mochitest-o: elapsed: 19 mins, 5 secs
Usually 16 mins
> mochitest-bc: elapsed: 1 hrs, 6 mins, 39 secs
Usually 34 mins
> crashtest: elapsed: 13 mins, 10 secs
Usually 7 mins
> reftest: elapsed: 34 mins, 21 secs
Usually 28 mins
> jsreftest: elapsed: 21 mins, 28 secs
Usually 10 mins
> xpcshell: elapsed: 57 mins, 49 secs
Usually about 32 mins
We've often said that an ASAN build is about the speed of a debug build (2x slower, roughly) than an opt build. And these numbers seem to bear that out. Given that we should NOT run Talos tests on an ASAN build, I don't think it would be a bad thing to turn these on per-push for linux 64. Given that linux is a platform we can put in the cloud that would provide some amount of useful data to developers about their patches with the least amount of impact to the overall automation infrastructure.
I would encourage you (decoder) to dig into that mochitest-1 number and understand why you're *faster* on an ASAN build than we generally are. Did the mochitest suite in question crash half-way through or something?
But otherwise, I don't see any reason why we couldn't move ahead with this per-push for linux 64 platforms given that we can virtualize most of that impact.
Comment 7•13 years ago
|
||
(In reply to Clint Talbert ( :ctalbert ) from comment #6)
>
> I would encourage you (decoder) to dig into that mochitest-1 number and
> understand why you're *faster* on an ASAN build than we generally are. Did
> the mochitest suite in question crash half-way through or something?
You're right, this is a more recent regression in M1 and the crash is in the WebGL testsuite. I'll try to explicitly disable the faulty tests and do another measurement. There also seem to be some random oranges on M-oth that could interfere here, I'll see if I can disable these too and get them on file. This shouldn't block us though from pushing the tests forward in the meantime.
Comment 8•12 years ago
|
||
I've disabled the WebGL tests in a try push (we need to re-enable them when we have upgraded our Mesa version) and mochitest-1 runs in 36 minutes now.
I'm also working actively on resolving another timeout on mochitest-bc, which might be due to OOM (the cloud linux machines seem to have more memory, enough to not trigger my "low-memory" configuration, but still not enough to run in default mode. Testing this now.).
There are also two more orange bugs open right now, but we can easily disable these tests until we have a fix.
Clint, what would be the next steps to get this on mozilla-inbound and have tests enabled? Or would we first enable them on m-c only and then move to inbound? The ultimate goal is to have the tests + unhide them on tbpl (which is not going to happen with m-c only).
Flags: needinfo?(ctalbert)
Updated•12 years ago
|
Alias: asan-tests
| Reporter | ||
Comment 9•12 years ago
|
||
:decoder:
Some questions while mtg w/ctalbert just now:
1) Before we start enabling these new tests on mozilla-inbound/b2g-inbound/fx-team, I recommend we get these working and all green on lower-traffic branch such as a project branch or mozilla-central first. Once these are all green, we can enable the tests on other high-volume branches.
2) The changes to what tests are run on what branches, are handled within buildbot scheduling logic. This bug is in the correct component for that once all dep.bugs are resolved.
3) Thanks for the runtimes in comment#5, comment#6. The question about what *frequency* of tests still unresolved - how often do you *need* these run? Is once-per-nightly enough? Given current infrastructure load, we can only support running additional tests like this on virtualized OS like ubuntu (not physical OS like fedora), and either way there's a financial $$$ cost to balance here.
Assignee: joduinn → nobody
Flags: needinfo?(ctalbert)
Comment 10•12 years ago
|
||
(In reply to John O'Duinn [:joduinn] from comment #9)
> :decoder:
>
>
> Some questions while mtg w/ctalbert just now:
> 1) Before we start enabling these new tests on
> mozilla-inbound/b2g-inbound/fx-team, I recommend we get these working and
> all green on lower-traffic branch such as a project branch or
> mozilla-central first.
I've done a try push today and it's almost green now. Getting it entirely green on mozilla-central/inbound is just a matter of days now. Once that is green, we can enable them on mozilla-central and then on mozilla-inbound (I recommend doing that quickly though before people start introducing more regressions again).
>
> 3) Thanks for the runtimes in comment#5, comment#6. The question about what
> *frequency* of tests still unresolved - how often do you *need* these run?
> Is once-per-nightly enough?
The sheriffs say we can only unhide the tests on tbpl if we have them on mozilla-inbound such that we can easily identify the regressing changeset.
> Given current infrastructure load, we can only
> support running additional tests like this on virtualized OS like ubuntu
> (not physical OS like fedora), and either way there's a financial $$$ cost
> to balance here.
I guess just Ubuntu is fine :) About the costs, this is something managers need to work out. Dan Veditz told me we have support for doing this, so I assume we also have the financial support :) I'm just driving the technical side of this.
Thanks!
Comment 11•12 years ago
|
||
So with regard to the requirement from the sheriffs, I totally understand where they are coming from. I also understand Joduinn's concerns about turning on a test system that will consume our test slaves for twice as long as normal. There has to be a compromise here.
How often do we expect ASAN builds to show regressions that are worthy of a backout? If they are going to be mostly green, then what if we run it on m-c for each push, and m-i and other integration branches every 4 hours? How stable have the ASAN tests been to date?
We need a way to turn them on in the short term, and in order to do that, we need to find a way to make them work without massive impacts to our current infrastructure load.
For the long term solution of running them per push in the cloud, we need to get more money allocated to our Amazon bill. For that, I'd recommend Dveditz start a thread with me, joduinn, and bmoss and make the case with regard to what having per push ASAN builds on every tree will buy us in the long term. (I'd write it with an eye toward what it will save us on having to do security re-spins). I'm happy to help edit it if you email me directly, but I don't want to try to make your case for you.
Ed, flagging you for more info on how we can turn these on in a way that won't severely impact slave wait times and will still allow the sheriffs to ascertain what went wrong when these tests highlight an issue. (See my proposal about three paragraphs above).
Flags: needinfo?(emorley)
Comment 12•12 years ago
|
||
ASan tests should be stable enough for that, it's not that there is a new failure every hour (not even every day). Just when it happens, Sheriffs need to be able to blame someone without going through a full merge I guess.
Comment 13•12 years ago
|
||
(In reply to comment #11)
> How often do we expect ASAN builds to show regressions that are worthy of a
> backout?
This is a very difficult question to answer without guessing. I don't think we can reliably answer this question for any of our other test suites, FWIW. And given the fact that these tests are stable, failures only in ASAN tests are probably more serious than the average failure that we back out stuff for these days.
| Assignee | ||
Updated•12 years ago
|
Product: mozilla.org → Release Engineering
Comment 14•12 years ago
|
||
(Commenting here at decoder's request)
I'm fine with Clint's proposal. We already have a precedent in that PGO builds are only run on a set frequency and we've managed to survive thus far. Ultimately, it's hard to judge whether this will work for ASAN or not - we don't know what we don't know :). I'm open to trying it as proposed for now as long as we remain open to increasing the frequency should we find it to be unworkable for whatever reason.
Comment 15•12 years ago
|
||
Asan builds only run on one platform (Fedora64) x opt+debug (100 and 150 mins respectively). This is relatively little load compared to all the other builds and test runs combined. As such, why are we worried this will increase load unnecessarily?
More importantly, why are we singling out ASan builds in particular for "we think they won't fail often, so let's not run them all the time"? The same could be said of many of the unit tests running on all three variants of OS X for example, which account for much more machine time.
Until we have the architecture in place to make regression hunting easy (eg bisect in the cloud) I'm really quite adverse to running ASan builds anything other than per push, given the little infra load saving. Once bisect in the cloud in up and running, we'll be able to save on ASan and many more suites combined, making this look like a drop in the ocean... :-)
Flags: needinfo?(emorley)
Comment 16•12 years ago
|
||
Oh tests, mis-read read prior comments (sorry still playing catch up, got back late from the ER, yey). In which case then yeah maybe we just need to run these periodically on non-mozilla-central trees at least until we know if they fail too frequently for that to be viable.
Comment 17•12 years ago
|
||
(In reply to Ed Morley [:edmorley UTC+1] from comment #16)
> Oh tests, mis-read read prior comments (sorry still playing catch up, got
> back late from the ER, yey). In which case then yeah maybe we just need to
> run these periodically on non-mozilla-central trees at least until we know
> if they fail too frequently for that to be viable.
Why non-mozilla-central? We are already creating builds for mozilla-central and have been working quite hard to get it green there because the next step was supposed to be enabling them on mozilla-central. And if that is stable, mozilla-inbound/unhiding on tbpl.
Comment 18•12 years ago
|
||
Per push mozilla-central, periodically for non-mozilla-central.
Updated•12 years ago
|
OS: Mac OS X → Linux
Hardware: x86 → All
Comment 19•12 years ago
|
||
ASan is now green on mozilla-central:
https://tbpl.mozilla.org/?tree=Try&rev=01a5d7808288
(The orange Build is ok, that's just because of the way this was pushed to try. The builds in the ASan build job are green of course).
| Reporter | ||
Comment 20•12 years ago
|
||
Per meeting w/dveditz, :decoder and joduinn just now:
(In reply to John O'Duinn [:joduinn] from comment #9)
> :decoder:
>
> Some questions while mtg w/ctalbert just now:
> 1) Before we start enabling these new tests on
> mozilla-inbound/b2g-inbound/fx-team, I recommend we get these working and
> all green on lower-traffic branch such as a project branch or
> mozilla-central first. Once these are all green, we can enable the tests on
> other high-volume branches.
decoder now has some testsuites running green for ASAN builds on ubuntu64, specifically: reftest, crashtest, xpcshell, jsreftest, mochitests. To make sure these tests *stay* green, we'd like to enable them on other branches, so developers can see bustages, and sheriffs can do backouts-as-needed.
This needs matching changes to trychooser, to enable running these ASan-builds, and tests-on-ASan-builds. Bug#847973 tracks getting those changes into trychooser. Bug#887641 tracks supporting those builds on Try, as not-default.
These tests-on-opt-asan builds are slower then usual tests-on-opt builds, and are approx same as debug builds (see comment#5, comment#6 for details). There is no need to run these ASAN tests on debug+asan builds, as these would be *super* slow.
> 2) The changes to what tests are run on what branches, are handled within
> buildbot scheduling logic. This bug is in the correct component for that
> once all dep.bugs are resolved.
For sheriffs to be able to support these builds+tests on mozilla-central, we also need to have these builds + tests on all 3 inbounds (mozilla-inbound, b2g-inbound, fx-team) and non-default-on-try. Ideally, security folks would like ASan builds+tests to be run per-checkin, so lets start with this, as this is most helpful to developers doing landings.
Other project branches may also choose to have ASan builds, but they should file bugs asking for them as/when needed.
> 3) Thanks for the runtimes in comment#5, comment#6. The question about what
> *frequency* of tests still unresolved - how often do you *need* these run?
> Is once-per-nightly enough? Given current infrastructure load, we can only
> support running additional tests like this on virtualized OS like ubuntu
> (not physical OS like fedora), and either way there's a financial $$$ cost
> to balance here.
Because these are running on Ubuntu64 (on AWS), notFedora64 (physical hardware), we can enable this without impacting other test jobs. If we find this $$$ on AWS to be a problem, we could reduce cadence - maybe to same cadence as the PGO builds for windows? It may also be possible to just once-per-night on the nightly ASan build, and then whenever a problem is detected, have sheriffs file bug with regression range to previous good nightly, and let developers or security folks figure it out using try?
Depends on: 847973
Comment 21•12 years ago
|
||
What's the hold up here? Let's get this stood up in AWS. Per comment 1 once a day is fine and over the course of the 7 months this has been open we must have figured out which tests need to be run and fixed whatever tests needed to be fixed. I cannot imagine that running this once a day is going to substantially impact our overall AWS bill. We can revisit this if I am wrong. Is it really a requirement in the short term to stand up trychooser? (there does seem to be a work around for the short run) or can that follow?
Comment 22•12 years ago
|
||
(In reply to Bob Moss :bmoss from comment #21)
> What's the hold up here? Let's get this stood up in AWS. Per comment 1 once
> a day is fine
In the meeting with joduinn today, we discussed how to proceed and this is going to be put into production soon. Per comment above, it won't be sufficient to do this once a day. Rather, we will be starting per push because otherwise we cannot unhide this on TBPL.
There is also a meeting scheduled now for Tuesday in two weeks in case this hasn't been put into production yet until then. Thanks!
Comment 23•12 years ago
|
||
Do we need tests on both the opt and debug asan builds?
Comment 24•12 years ago
|
||
(In reply to Chris AtLee [:catlee] from comment #23)
> Do we need tests on both the opt and debug asan builds?
Just opt:
(In reply to John O'Duinn [:joduinn] from comment #20)
> These tests-on-opt-asan builds are slower then usual tests-on-opt builds,
> and are approx same as debug builds (see comment#5, comment#6 for details).
> There is no need to run these ASAN tests on debug+asan builds, as these
> would be *super* slow.
Comment 25•12 years ago
|
||
Ah, missed that, thanks!
Comment 26•12 years ago
|
||
Should we run jetpack tests on these builds?
Comment 28•12 years ago
|
||
Attachment #797420 -
Flags: review?(rail)
Updated•12 years ago
|
Attachment #797416 -
Flags: review?(rail) → review+
Updated•12 years ago
|
Attachment #797420 -
Flags: review?(rail) → review+
Updated•12 years ago
|
Attachment #797420 -
Flags: checked-in+
Comment 29•12 years ago
|
||
Attachment #797428 -
Flags: review?(rail)
Updated•12 years ago
|
Attachment #797428 -
Flags: review?(rail) → review+
Updated•12 years ago
|
Attachment #797428 -
Flags: checked-in+
Comment 30•12 years ago
|
||
Comment on attachment 797416 [details] [diff] [review]
get asan tests running on cedar, try
should be live on cedar/try on the next reconfig
Attachment #797416 -
Flags: checked-in+
Comment 31•12 years ago
|
||
In production.
Comment 32•12 years ago
|
||
Attachment #797922 -
Flags: review?(rail)
Updated•12 years ago
|
Attachment #797922 -
Flags: review?(rail) → review+
Comment 33•12 years ago
|
||
noticed these builder types are missing from watch_pending.cfg too
Attachment #797924 -
Flags: review?(rail)
Updated•12 years ago
|
Attachment #797924 -
Flags: review?(rail) → review+
Updated•12 years ago
|
Attachment #797922 -
Flags: checked-in+
Updated•12 years ago
|
Attachment #797924 -
Flags: checked-in+
Comment 34•12 years ago
|
||
The patches here have been merged to production (and presumably reconfiged), but the tests aren't being scheduled:
https://tbpl.mozilla.org/?showall=1&jobname=asan
catlee, any ideas? :-)
Flags: needinfo?(catlee)
Comment 36•12 years ago
|
||
Oh, misread earlier comments to mean the patches were for all trees; can see the patch description states the opposite, sorry! :-)
Comment 37•12 years ago
|
||
enable tests on m-c, and disable builds/tests on cedar
Attachment #799112 -
Flags: review?(rail)
Updated•12 years ago
|
Attachment #799112 -
Flags: review?(rail) → review+
Updated•12 years ago
|
Attachment #799112 -
Flags: checked-in+
Comment 38•12 years ago
|
||
Latest patch is in production.
Comment 39•12 years ago
|
||
All done! Jobs are running (but hidden) on tbpl:
https://tbpl.mozilla.org/?showall=1&jobname=asan
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Comment 40•12 years ago
|
||
(In reply to John O'Duinn [:joduinn] from comment #20)
> For sheriffs to be able to support these builds+tests on mozilla-central, we
> also need to have these builds + tests on all 3 inbounds (mozilla-inbound,
> b2g-inbound, fx-team) and non-default-on-try. Ideally, security folks would
> like ASan builds+tests to be run per-checkin, so lets start with this, as
> this is most helpful to developers doing landings.
Needed on more than mozilla-central :-)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 41•12 years ago
|
||
To be honest we might as well enable them for all trunk-matching repos:
1) mozilla-central + 3xinbounds will be 95% of the trunk push count, so we won't exactly save much by leaving them off elsewhere (and will only have bad surprises when merging into m-c from project repos otherwise).
2) the asan test jobs are actually quicker than debug runs, and we're only doing them on one platform - so in the grand scheme of things I still think we're worrying over nothing.
Comment 42•12 years ago
|
||
And then once you think about writing yet another loop to remove the job from release branches, this probably ought to have a follow-the-trains loop, because we don't really want to properly use memory on the trunk, and then screw up in an insecure way while porting a patch to a release branch, and find out that we did by paying yet another bounty to someone who does run ASan on release branches.
Comment 43•12 years ago
|
||
Attachment #800749 -
Flags: review?(rail)
Updated•12 years ago
|
Attachment #800749 -
Flags: review?(rail) → review+
Updated•12 years ago
|
Attachment #800749 -
Flags: checked-in+
Comment 44•12 years ago
|
||
something here is in production
Comment 45•12 years ago
|
||
Both builds and tests are going on other trees, and are now unhidden on m-c, so _everything_ here is in production!
Status: REOPENED → RESOLVED
Closed: 12 years ago → 12 years ago
Resolution: --- → FIXED
Comment 46•12 years ago
|
||
Attachment #804464 -
Flags: review?(catlee)
Updated•12 years ago
|
Attachment #804464 -
Flags: review?(catlee) → review+
Comment 47•12 years ago
|
||
Comment on attachment 804464 [details] [diff] [review]
buildapi
https://hg.mozilla.org/build/buildapi/rev/358a04471ef1
Attachment #804464 -
Flags: checked-in+
Comment 48•12 years ago
|
||
We've just had ASan only test breakage, that was easily identifiable due to us doing per push builds \o/ :-)
https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=813a35c5b24a
Comment 49•12 years ago
|
||
(In reply to Ed Morley [:edmorley UTC+1] from comment #48)
> We've just had ASan only test breakage, that was easily identifiable due to
> us doing per push builds \o/ :-)
>
> https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=813a35c5b24a
Thanks for letting us know! Philor also told me about this. Is it possible that the sheriffs could record such cases when they see it? At least for a while. We are of course interested how many failures we're catching now that would otherwise have been missed or identified later maybe. That would be super awesome.
Comment 50•12 years ago
|
||
Yup we can do :-)
CCing the remaining sheriffs not yet CCed - please see comment 49 :-)
Comment 51•12 years ago
|
||
For future reference, that was bug 918041, and bug 895091 comment 106 shows a backout.
| Assignee | ||
Updated•8 years ago
|
Component: General Automation → General
You need to log in
before you can comment on or make changes to this bug.
Description
•