Closed
Bug 463262
Opened 16 years ago
Closed 14 years ago
run unit test on non-sse and ppc machines
Categories
(Release Engineering :: General, defect, P2)
Release Engineering
General
Tracking
(blocking2.0 beta2+, status1.9.1 ?)
RESOLVED
FIXED
People
(Reporter: mozilla, Assigned: jhford)
References
Details
Attachments
(3 files)
302.00 KB,
image/png
|
Details | |
7.60 KB,
patch
|
mozilla
:
review+
jhford
:
checked-in+
|
Details | Diff | Splinter Review |
6.41 KB,
patch
|
mozilla
:
review+
jhford
:
checked-in+
|
Details | Diff | Splinter Review |
This is for non-Intel chipsets that have issues with the added instruction sets.
Comment 1•16 years ago
|
||
sayrer/shaver:
do you want us to *build* and test on non-SSE VM?
...or...
do you want us to take an existing build from an SSE VM and *test* on non-SSE VM?
Comment 2•16 years ago
|
||
(In reply to comment #1)
> do you want us to take an existing build from an SSE VM and *test* on non-SSE
> VM?
We want to test a standard (SSE-capable) build on a non-SSE machine. If we can build with SSE on a machine without it, that's fine, but we don't want a non-SSE build, just non-SSE runtime.
Reporter | ||
Updated•16 years ago
|
Summary: Create non-SSE build for testing → setup win32-sse builds to automatically run unittests on both win32-non-sse VMs and win32-sse VMs
Reporter | ||
Updated•16 years ago
|
Priority: -- → P2
Comment 3•16 years ago
|
||
Tweaking summary, as this is needed for linux *and* win32.
Per discussions with Damon, running this once a night would be plenty. No need to schedule per-checkin tests, so no need for larger pool of slaves.
OS: Windows Server 2003 → All
Hardware: PC → All
Summary: setup win32-sse builds to automatically run unittests on both win32-non-sse VMs and win32-sse VMs → setup sse builds to automatically run unittests on both non-sse VMs and sse VMs
Comment 4•16 years ago
|
||
(In reply to comment #2)
> (In reply to comment #1)
> > do you want us to take an existing build from an SSE VM and *test* on non-SSE
> > VM?
>
> We want to test a standard (SSE-capable) build on a non-SSE machine. If we can
> build with SSE on a machine without it, that's fine, but we don't want a
> non-SSE build, just non-SSE runtime.
This means we need to be able to run unittests on pre-existing build, which we cannot yet do. Adding dependency and moving to Future per discussions with Aki and Ted. (Should have updated this bug weeks ago, but holidays and end-of-quarter overran my brain, sorry).
Assignee: aki → nobody
Component: Release Engineering → Release Engineering: Future
Depends on: 421611
Comment 6•16 years ago
|
||
(In reply to comment #4)
> >
> > We want to test a standard (SSE-capable) build on a non-SSE machine. If we can
> > build with SSE on a machine without it, that's fine, but we don't want a
> > non-SSE build, just non-SSE runtime.
>
> This means we need to be able to run unittests on pre-existing build, which we
> cannot yet do.
Either build style works.
Flags: blocking1.9.1+
Comment 7•16 years ago
|
||
John, who in RelEng owns this?
Comment 8•16 years ago
|
||
as I recall - this is blocked on the patch from Ted, no?
Comment 9•16 years ago
|
||
Yeah, and the setup of the slaves (bug 465302), from the deps listed above.
Comment 10•16 years ago
|
||
OK. So, do we want to hold of on fingering someone as the owner? I just wanna make sure we're all in line to crank through the remaining blockers during RC. I think finding an owner for these bugs (i.e., including bug 465302) would be ideal.
Comment 11•16 years ago
|
||
releng is the owner, with the bug blocked on the unit test bug. are you looking for an individual owner rather than a group?
Comment 12•16 years ago
|
||
Yeah, I'm just looking for someone I can beat with a stick once this bug becomes the last thing blocking 1.9.1. :)
Comment 13•16 years ago
|
||
ted first, then releng :-)
Comment 14•16 years ago
|
||
I don't think this will work on these VMs, see bug 492589 comment 5. We'll probably have to get physical machines that either have ancient CPUs or allow disabling of SSE features in BIOS.
Comment 15•16 years ago
|
||
1) VMs dont support this, even when they claim the do. Details in bug#492589#c5. We have therefore deleted the two *nonsse VMs created for this, as they are useless.
2) ted is now trying to see if anyone in community has old-enough hardware, which is running a nonSSE cpu.
3) from irc: some debate about using QEMU as emulator, but dismissed because of cases where QEMU did not catch problems that crashed on an end user's nonSSE computer.
4) from irc: seems that the best choice for CPU is an AMD Athlon K7, details here: http://en.wikipedia.org/wiki/Athlon and http://en.wikipedia.org/wiki/SSE2#CPUs_supporting_SSE2. Not sure where we can buy those anymore, some quick websurfing and phone calls were fruitless.
5) I question if this bug should be "blocking1.9.1", but dont know how/who to ask. However, I could possibly see bug#492589 being reopened and marked as "blocking1.9.1", and even that is totally dependent on being able to find the right hardware.
Comment 16•16 years ago
|
||
>
> 5) I question if this bug should be "blocking1.9.1", but dont know how/who to
> ask.
I marked this bug blocking+ on January 7 2009. I think it's still the right thing, unless we don't support this platform anymore. Shaver or justin probably know where to ask.
Updated•16 years ago
|
Component: Release Engineering: Future → Release Engineering
Comment 17•16 years ago
|
||
(In reply to comment #15)
> 5) I question if this bug should be "blocking1.9.1", but dont know how/who to
> ask. However, I could possibly see bug#492589 being reopened and marked as
> "blocking1.9.1", and even that is totally dependent on being able to find the
> right hardware.
Instead of reopening bug#492589 (test manually on VMs), ted filed bug#492589 (test manually on hardware) and marked that "blocking1.9.1". We need to know this can work manually before we try automating anything, so setting as dependent bug.
Depends on: 492589
Comment 18•16 years ago
|
||
Based on the success of bug 492589, we can take this one off the blocker list. Discussed this with Sayre and he agrees. We'll need Ted to run the tests manually before each RC and final. If someone disagrees, please re-nom.
Flags: blocking1.9.1+ → blocking1.9.1-
Comment 19•16 years ago
|
||
Re-nom: we don't know that 492589 was a success (we don't even have a list of the tests, let alone another run to see if the frequency of the randoms is the same!) based on the data there, and we need coverage for m-c and 1.9.1 after 3.5.0 is released.
If we can run it once, isn't it straightforward to set it up in cron or a slave script and have it run all the time, reporting to the TM tree?
Flags: blocking1.9.1- → blocking1.9.1?
Comment 20•16 years ago
|
||
Also, note that bug 492589 is now a blocker. But, we probably do need multiple runs to see frequency of randoms (I like that phrase).
Comment 21•16 years ago
|
||
Can we get a blocking decision here, one way or another, please?
The original reporter is apparently satisfied (see comment 18) but that's based on an assumption that bug 492589 (manually running the test) was a success which might not be a big deal as that bug does block, meaning that we will at least have an answer about unittests on sse builds.
I can't comment on what Shaver's asking for in comment 19, but from my product driving side comes a request to ensure that we test builds before cutting for RC in a way that lets us know if we're going to break on SSE. If that's already covered by bug 492589, then I don't think this bug blocks our release of Firefox 3.5, though it should obviously be up there on the to-do list for releng.
Comment 22•16 years ago
|
||
I would mark blocking 3.5.1 if I could -- we need automation here ASAP, but if we're willing to burn some ted-cycles on the manual runs (probably need several if there are "randoms" in play, and may want a valgrind run of the suite as well) then I'm OK with that.
Flags: wanted1.9.1.x?
Flags: wanted1.9.1+
Flags: blocking1.9.2+
Flags: blocking1.9.1?
Flags: blocking1.9.1-
Comment 23•15 years ago
|
||
Found during triage, assigning to joduinn for investigation.
Assignee: nobody → joduinn
Comment 24•15 years ago
|
||
Where are we here? Do we need this for 1.9.1 still or just 1.9.2? Has any work been done in the last 7 weeks?
status1.9.1:
--- → ?
Flags: wanted1.9.1.x?
Comment 25•15 years ago
|
||
I confirmed with rsayrer that this is still needed in advance of 1.9.2 release.
Nothing to do here until blocking bugs are fixed, so putting this bug back in the pool.
Assignee: joduinn → nobody
Assignee | ||
Comment 27•15 years ago
|
||
After doing some tests, I can't currently run the PPC test because I don't have any builds that are universal and have symbols. I am going to look at getting windows and linux going tomorrow.
Depends on: 457753
Summary: setup sse builds to automatically run unittests on both non-sse VMs and sse VMs → run unit test on non-sse and ppc machines
Comment 28•15 years ago
|
||
What's wrong with these builds?
http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-macosx/
Aside from the fact that there are debug builds mixed in there, which should get fixed by some other bug whose number escapes me, those are the builds from "OS X 10.5.2 mozilla-central build", and they are universal builds.
Assignee | ||
Comment 29•15 years ago
|
||
Hmm, i seemed to have gotten a debug build every other time I have tried. I have tried with http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-macosx/1255468611/firefox-3.7a1pre.en-US.mac.dmg and it seems to be running fine.
Assignee | ||
Comment 30•15 years ago
|
||
Unsurprisingly, these machines are very slow. I am getting an unresponsive script warning. Is there a way to disable this prompt?
Assignee | ||
Comment 31•15 years ago
|
||
Aki showed me dom.max_script_run_time. I am going to set this to 0 and see if this changes anything.
Comment 32•15 years ago
|
||
Shaver: you marked this as blocking, are you saying we can't release without running the tests? Lots of dependencies here that are unresolved :(
Comment 33•15 years ago
|
||
a bunch of ppc and non-sse machines were just brought up in the mv server room and I just bought 2 more non-sse machines. guess I am just pointing out there is progress, but if this is actually blocking, seems it needs to be a drop-everything for rel-eng, no?
Assignee | ||
Comment 34•15 years ago
|
||
I should probably do a quick status update on this. We just landed a large part of the automation/master side work to trigger test runs and I have verified that it is triggering test runs. I am currently blocked on getting the slaves up and running. I have our linux and leopard slaves working. I am having a little more difficulty with tiger and windows. As I understand it, tiger is low priority and Windows is critical as it is most of our user base.
I had a set back today in that both of our windows xp slaves completely gave out. I have installed windows on one of the new machines and will be configuring it first thing tomorrow.
Currently the test runs are triggered at the completion of nightly builds. Because mozilla-1.9.2 builds [1] do not upload required artifacts (tests and symbols), I can only run mozilla-central [2] tests in a fully automated fashion. The fix would be to start uploading the required symbols and test packages for mozilla-1.9.2 nightlies. I am assuming this is possible because we are uploading these files for the 1.9.2 tinderbox-builds.
I would also like to know how long this testing is going to be required. If it is only going to be until 3.6 is released, it might be acceptable to trigger the jobs manually. If this is something we need going forth for all releases, I can look into the more permanent fixes. If this is going to be required going forth, I would like to know which branches need to be covered.
Of all the bugs that are blocking this, only setting up windows slaves (bug 522379) is actually blocking running these tests against mozilla-central. I will file the bug for getting the mozilla-1.9.2 artifacts uploaded shortly and will add it as blocking this bug.
[1] example: http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-mozilla-1.9.2/
[2] example: http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-mozilla-central/
Assignee | ||
Comment 35•15 years ago
|
||
This is tested to work. The windows and linux slave are running P3-1.2GHz Tualatin core cpus. The leopard ones are running either a G4-1.42GHz Mini or a Dual-1.0GHz PowerMac G4. Until this above issue with mozilla-1.9.2 producing packaged unittests is addressed, I cannot run Tiger tests as mozilla-central (the only branch I am running on currently) does not run on Tiger. Adding a new arch/os to this testing isn't a lot of overhead, as long as it can run the standard unit test commands and we already produce compatible binaries.
Assignee | ||
Comment 36•15 years ago
|
||
These machines have fallen down a very long time ago and are going to need some serious love to get them back up.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → INCOMPLETE
Comment 37•15 years ago
|
||
what does this mean?
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
Assignee | ||
Comment 39•15 years ago
|
||
(In reply to comment #36)
> These machines have fallen down a very long time ago and are going to need some
> serious love to get them back up.
(In reply to comment #37)
> what does this mean?
Sorry, let me clarify where things stand right now:
* Linux on p3 is running properly and has been since November 23, 2009. The tests have been orange/red since we started them running, this is because of tests timing out or not running correctly, according to Ted.
* WindowsXP on p3 was running tests with orange/red tests, like linux, until November 27, 2009, but has not run since then. I don't know what the status of this machine is currently, but it is not responding to pings. Its unclear if this is a hardware problem or a WinXP license problem.
* leopard slave on PowerPC G4 was working correctly with orange/red tests, until January 22, but has not reported anything since then. I've looked around on the machine, and found it stuck on a job since January 22. I've killed that, and rebooted the machine this morning. Up until the 22 of january, this slave was working properly. I will see what happens overnight.
* tiger slave is not working and I am not sure that it ever has worked.
All of these were running for nightly builds on mozilla-central. They take 6-12 hours to run one cycle on one nightly build, so we've never had enough horsepower to run this on other branches, or more frequently then just once per nightly.
Assignee | ||
Comment 40•15 years ago
|
||
(In reply to comment #39)
> * Linux on p3 is running properly and has been since November 23, 2009. The
> tests have been orange/red since we started them running, this is because of
> tests timing out or not running correctly, according to Ted.
Linux tests were broken for a short time due to fall out from bug 549427. I imported http://hg.mozilla.org/build/buildbotcustom/rev/a20f711dc417 and did a restart
> * leopard slave on PowerPC G4 was working correctly with orange/red tests,
> until January 22, but has not reported anything since then. I've looked around
> on the machine, and found it stuck on a job since January 22. I've killed that,
> and rebooted the machine this morning. Up until the 22 of january, this slave
> was working properly. I will see what happens overnight.
This machine came back from reboot but is not currently able to connect to the geriatric master. I am seeing messages like "<twisted.internet.tcp.Connector instance at 0x77f3c8> will retry in 36 seconds" in the slave log and nothing on the master side. I was able to nc the master on the correct slave port and did get the pb prompt. I am not sure what is going wrong.
Assignee | ||
Comment 41•15 years ago
|
||
(In reply to comment #40)
> This machine came back from reboot but is not currently able to connect to the
> geriatric master. I am seeing messages like "<twisted.internet.tcp.Connector
> instance at 0x77f3c8> will retry in 36 seconds" in the slave log and nothing on
> the master side. I was able to nc the master on the correct slave port and did
> get the pb prompt. I am not sure what is going wrong.
Just realised that this was the slave not able to find the master as the old domain name must have disappeared.
s/geriatric-master.mv.mozilla.com/geriatric-master.build.mozilla.org/
has fixed this and the slave is back in the pool.
Assignee | ||
Comment 42•15 years ago
|
||
(In reply to comment #39)
> * WindowsXP on p3 was running tests with orange/red tests, like linux, until
> November 27, 2009, but has not run since then. I don't know what the status of
> this machine is currently, but it is not responding to pings. Its unclear if
> this is a hardware problem or a WinXP license problem.
reinstallation of windows on this machine is being tracked in bug 563831. Installation and configuration of automation tools will be tracked in this bug.
Assignee | ||
Comment 43•15 years ago
|
||
joduinn found two more Mac PPC systems. Tracking the work to add these to geriatric master in bug 549559
Depends on: 549559
Assignee | ||
Comment 44•15 years ago
|
||
this patch brings long overdue improvements to the geriatric master.
-Understand variants on the geriatric master instead of build master
-adding a new variant requires no change to production master
-Split tests into their own builders
-one builder per test on each platform variant
Attachment #445439 -
Flags: review?(aki)
Assignee | ||
Comment 45•15 years ago
|
||
required buildbotcustom changes
Attachment #445440 -
Flags: review?(aki)
Reporter | ||
Updated•15 years ago
|
Attachment #445439 -
Flags: review?(aki) → review+
Reporter | ||
Updated•15 years ago
|
Attachment #445440 -
Flags: review?(aki) → review+
Assignee | ||
Comment 46•15 years ago
|
||
We now have OSX 10.5 Coverage on Leopard.
Working on Windows XP Slave set up is being tracked in bug 566955
Depends on: 566955
Assignee | ||
Comment 47•15 years ago
|
||
Comment on attachment 445439 [details] [diff] [review]
buildbot-configs patch
http://hg.mozilla.org/build/buildbot-configs/rev/ca0af754c5ea
Attachment #445439 -
Flags: checked-in+
Assignee | ||
Comment 48•15 years ago
|
||
Comment on attachment 445440 [details] [diff] [review]
buildbotcustom patch
http://hg.mozilla.org/build/buildbotcustom/rev/48c1f50e0651
Attachment #445440 -
Flags: checked-in+
Comment 49•15 years ago
|
||
exceptions.ValueError: incomplete format
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1274349723.1274351109.10254.gz#err0
revision=WithProperties("%(got_revision)"),
should be
revision=WithProperties("%(got_revision)s"),
Comment 50•15 years ago
|
||
Armen fixed the exception:
http://hg.mozilla.org/build/buildbotcustom/rev/f1d6697a7100
Comment 51•15 years ago
|
||
Did you run this patch overnight? I am not sure if you would have been able to
hit it on staging or not (depends on the sendchanges).
Comment 52•14 years ago
|
||
So are we done here?
Updated•14 years ago
|
blocking2.0: beta1+ → beta2+
Assignee | ||
Comment 53•14 years ago
|
||
(In reply to comment #52)
> So are we done here?
I believe so. Between June 17 and today the buildbot master was down. It was brought down to upgrade the ram in the mountain view ESX hosts and was never started up again after. I have filed bug 574415 to add all the old machines into our nagios alerts to avoid this problem in future.
I have started tests against the latest nightlies and will report back with the results.
Assignee | ||
Comment 54•14 years ago
|
||
I ran the tests with the latest nightly builds. The status is as follows:
-Linux
-All tests failed because of an incompatibility with SELinux
-Leopard G5
-xpcshell orange 786/3
-crashtest green 1611/0/10
-reftest green 4449/0/219
-mochitest-plain green 183334/0/1474
-Leopard G4
-xpcshell, crashtest, reftest same as G5
-mochitest-plain still running with at least one test failure
-Win32
-xpcshell timed out
-crashtest green 1611/0/10
-reftest orange 4434/20/214
-mochitest-plain orange 206966/144/1469
I have disabled SELinux on the P3 computer and launched another round of tests. Its looking like the tests are actually running with selinux off. I will report back in a couple hours with the status of Leopard-G4's mochitest results and the Linux results.
For the curious, the results of this testing goes to http://tinderbox.mozilla.org/showbuilds.cgi?tree=GeriatricMachines
Assignee | ||
Comment 55•14 years ago
|
||
(In reply to comment #54)
> I will
> report back in a couple hours with the status of Leopard-G4's mochitest results
> and the Linux results.
The linux results are:
-xpcshell orange 786/4
-crashtest green 1612/0/9
-reftests 4418/6/244
-mochitest-plain timed out
The G4 and Linux mochitest-plain timed out after running '/tests/layout/style/test/test_value_cloning.html' by not generating any output for 5400 seconds. These tests all run slowly on these slow machines, so the oranges/timeouts are "normal", and fixing these would likely require reworking the test suites. For previous releases, we've avoided that by always needing human inspection (usually by ted iirc), and this seems to be still true here.
It feels like the infrastructure setup work is done here, and if people want to rework tests to pass green on slower machines, that work should be tracked as separate bugs with the specific test suite owners. Does that seem reasonable?
Assignee | ||
Comment 56•14 years ago
|
||
Still running with similar results. It looks like the infrastructure is set up.(In reply to comment #55)
> (In reply to comment #54)
> > I will
> > report back in a couple hours with the status of Leopard-G4's mochitest results
> > and the Linux results.
>
> The linux results are:
> -xpcshell orange 786/4
> -crashtest green 1612/0/9
> -reftests 4418/6/244
> -mochitest-plain timed out
>
> The G4 and Linux mochitest-plain timed out after running
> '/tests/layout/style/test/test_value_cloning.html' by not generating any output
> for 5400 seconds. These tests all run slowly on these slow machines, so the
> oranges/timeouts are "normal", and fixing these would likely require reworking
> the test suites. For previous releases, we've avoided that by always needing
> human inspection (usually by ted iirc), and this seems to be still true here.
>
> It feels like the infrastructure setup work is done here, and if people want to
> rework tests to pass green on slower machines, that work should be tracked as
> separate bugs with the specific test suite owners. Does that seem reasonable?
Just checked again today, all are still running with similar results. It looks like the infrastructure is set up. :-)
Status: REOPENED → RESOLVED
Closed: 15 years ago → 14 years ago
Resolution: --- → FIXED
Comment 57•14 years ago
|
||
since this was blocking 1.9.2, can we set status.1.9.2 to at least final-fixed? otherwise it still appears in queries as not being fixed.
Assignee | ||
Comment 58•14 years ago
|
||
Am I able to set those flags?
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•