Closed
Bug 492589
Opened 16 years ago
Closed 15 years ago
manually run unittest on two old non-SSE2 boxes
Categories
(Release Engineering :: General, defect)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: ted, Assigned: ted)
References
Details
(Keywords: fixed1.9.1)
Attachments
(4 files)
There were some non-SSE VMs created in bug 462190. We'd like to get unittests running on them regularly, but as a stopgap I'm going to try one-off unittest runs on them.
Flags: blocking1.9.1+
Assignee | ||
Updated•16 years ago
|
Status: NEW → ASSIGNED
Assignee | ||
Comment 1•16 years ago
|
||
I'd like to get a sanity-check that these VMs do in fact have SSE disabled, but I need help. /proc/cpuinfo is not inspiring confidence, certainly:
[cltbld@moz2-linuxnonsse-slave01 builds]$ grep sse /proc/cpuinfo
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat clflush dts acpi mmx fxsr sse sse2 ss constant_tsc up
Assignee | ||
Comment 3•16 years ago
|
||
Yeah, I really don't believe these VMs are non-SSE. test.c:
#include <stdio.h>
int main(int argc, char**argv)
{
#define cpuid(func,ax,bx,cx,dx)\
__asm__ __volatile__ ("cpuid":\
"=a" (ax), "=b" (bx), "=c" (cx), "=d" (dx) : "a" (func));
int a, b, c, d;
cpuid(0x1, a, b, c, d);
if (d & (1 << 25)) { printf("sse enabled\n"); }
if (d & (1 << 26)) { printf("sse2 enabled\n"); }
if (c & (1 << 0)) { printf("sse3 enabled\n"); }
return 0;
}
[cltbld@moz2-linuxnonsse-slave01 builds]$ gcc -o testsse test.c
[cltbld@moz2-linuxnonsse-slave01 builds]$ ./testsse
sse enabled
sse2 enabled
sse3 enabled
Comment 4•16 years ago
|
||
You should be able to check the VM's config - Phong, is that right?
Assignee | ||
Comment 5•16 years ago
|
||
I don't think this will work at all, per VMWare (quoted here):
http://www.novosco.com/articles/2008/08/19/vmware-esx-and-enhanced-vmotion-compatibility/
I don't know if we're using Enhanced VMotion Compatibility or not, but if not:
* SSE features can be used by user-level code (applications).
* Mask does not work for user-level code (i.e. applications).
* In user-level code, CPUID is executed directly on hardware and is not intercepted by VMware.
* Thus, VM cannot reliably hide SSE from an application
Even if we are:
EVC utilizes hardware support to modify the semantics of the CPUID instruction only. It does not disable the feature itself. For example, if an attempt to disable SSE4.1 is made by applying the appropriate masks to a CPU that has these features, this feature bit indicates SSE4.1 is not available to the guest or the application, but the feature and the SSE4.1 instructions themselves (such as PTESE and PMULLD) are still available for use. This implies applications that do not use the CPUID instruction to determine the list of supported features, but use try‐catch undefined instructions (#UD) instead, can still detect the existence of this feature.
This won't let us test what we're trying to test.
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Resolution: --- → WONTFIX
Assignee | ||
Comment 6•16 years ago
|
||
I'm trolling for community help, there's probably someone out there with older hardware that we can get to do this:
http://forums.mozillazine.org/viewtopic.php?f=23&t=1247655
Comment 7•16 years ago
|
||
http://www.nongnu.org/qemu/qemu-tech.html#SEC3
Qemu does not have SSE support. It also loads vmdk if i am not mistaken, and it runs on windows, osx and linux.
Comment 8•16 years ago
|
||
found an old P3 machine Egg, going to do a unit test run. Maybe I need to find a spare disk that i can format
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Assignee | ||
Updated•16 years ago
|
Assignee: ted.mielczarek → jford
Status: REOPENED → ASSIGNED
Summary: try a one-off unittest run on non-SSE VMs → try a one-off unittest run on some random old box that reed found
Assignee | ||
Updated•16 years ago
|
Summary: try a one-off unittest run on some random old box that reed found → try a one-off unittest run on some random old box that reed found (non-SSE)
Comment 9•16 years ago
|
||
I have two old HP servers (BTek, Spider) that I have been given the ok to use by Reed. I am just about done installing ubuntu on one and I am waiting on an XP license for the other one.
Comment 10•16 years ago
|
||
(In reply to comment #8)
> found an old P3 machine Egg, going to do a unit test run. Maybe I need to
> find a spare disk that i can format
Actually, there were 3 machines: btek, spider and egg.
egg turned out to be way older, so we cannibalized parts from egg to increase
RAM and replace useless video card in btek.
btek now has ubuntu v9.04 installed, with a cltbld account account on it.
However, it still needs network configuration, DNS configs, etc.
spider now has WinXP installed, with a license key, with a cltbld account on
it. It also needs network configuration, DNS configs, etc. These will also need
VNC (or RDP) installed for Ted to be able to remotely connect and use them for
running tests.
Per discussion with shaver and damons this morning about other priorities,
these machines are being handed back to IT to finish the o.s. setup. Once both
are ready, please reassign back, so Ted can try a manual unittest run on them.
Comment 11•16 years ago
|
||
(In reply to comment #3)
> Yeah, I really don't believe these VMs are non-SSE. test.c:
> #include <stdio.h>
>
> int main(int argc, char**argv)
> {
> #define cpuid(func,ax,bx,cx,dx)\
> __asm__ __volatile__ ("cpuid":\
> "=a" (ax), "=b" (bx), "=c" (cx), "=d" (dx) : "a" (func));
>
> int a, b, c, d;
> cpuid(0x1, a, b, c, d);
>
> if (d & (1 << 25)) { printf("sse enabled\n"); }
> if (d & (1 << 26)) { printf("sse2 enabled\n"); }
> if (c & (1 << 0)) { printf("sse3 enabled\n"); }
>
> return 0;
> }
> [cltbld@moz2-linuxnonsse-slave01 builds]$ gcc -o testsse test.c
> [cltbld@moz2-linuxnonsse-slave01 builds]$ ./testsse
> sse enabled
> sse2 enabled
> sse3 enabled
Also, jhford ran ted's diagnostic program on btek, and only got "sse enabled" as expected, for these machines are dual-P3 CPUs running at 500MHz. As expected, these machines do not have sse2, sse3 enabled.
Summary: try a one-off unittest run on some random old box that reed found (non-SSE) → manually run unittest on two old non-SSE2 boxes
Updated•16 years ago
|
Assignee: jford → server-ops
Component: Release Engineering → Server Operations
Flags: blocking1.9.1+
OS: Linux → All
QA Contact: release → mrz
Hardware: x86 → All
Comment 12•16 years ago
|
||
>
> Per discussion with shaver and damons this morning about other priorities,
> these machines are being handed back to IT to finish the o.s. setup. Once both
> are ready, please reassign back, so Ted can try a manual unittest run on them.
i know we talked about this on the phone but what IT steps are left? Boxes are up and running?
Comment 13•16 years ago
|
||
(In reply to comment #12)
> >
> > Per discussion with shaver and damons this morning about other priorities,
> > these machines are being handed back to IT to finish the o.s. setup. Once both
> > are ready, please reassign back, so Ted can try a manual unittest run on them.
>
> i know we talked about this on the phone but what IT steps are left? Boxes are
> up and running?
Boxes now reassembled and at reed's desk. They'd need to be racked somewhere (downstairs in K?), and then also need the following from comment#10:
"...network configuration, DNS configs, etc. These will also need
VNC (or RDP) ..."
Comment 14•16 years ago
|
||
Reed gets this because they're sitting next to his desk :)
Assignee: server-ops → reed
Comment 15•16 years ago
|
||
spider is racked and cabled... 5/19 on the switch just needs its vlan changed from 200 to 500, and it'll be ready to go. I just turned RDP on for now. If you need VNC, you're welcome to install it. It'll be accessible at spider.office.mozilla.org within the MV Office VPN once the vlan has been changed and the networking restarted.
btek, on the other hand, is dead. When it was plugged in, its power supply instantly died and made smelly smoke, as the power supply was set for 115 instead of 230. We can either try to replace the power supply or just get another box instead. Thoughts?
Assignee | ||
Comment 16•16 years ago
|
||
I'd go with whatever you think is fastest.
Assignee | ||
Comment 17•16 years ago
|
||
I've got a mochitest run started on spider (WinXP). I downloaded the latest 1.9.1 unittest build that was available, which was this one:
http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-1.9.1-win32-unittest/1242749541/
Comment 18•16 years ago
|
||
btek is broken. I have balsa here and I can move the good hardware from btek into balsa when there is some spare time.
Comment 19•16 years ago
|
||
I have rebuilt Balsa's hardware to have 2x500MHz P3 cpus that are identical to the ones in spider. I have also installed the scsi card and drives from btek but it isn't booting properly and the hard drives are not being picked up my the scsi bios. If the scsi card cannot be coerced into working there are some ATA drives left over from egg which can be used but require a reinstall of linux. I have the rebuilt balsa and the reminents of Egg and Btek by my desk. What do I do with them? Egg is totally broken but btek could be useful for spares.
Assignee | ||
Comment 20•16 years ago
|
||
The mochitest run on spider finished without crashing. I'll run through the rest of the test suites today.
Assignee | ||
Comment 21•16 years ago
|
||
I ran through all of our test suites (mochitest, mochitest chrome, mochitest browser-chrome, mochitest a11y, reftest, crashtest, xpcshell tests) on spider. There were some test failures (that I didn't look into very deeply, but most look like the same kind of intermittent failures as on tinderbox), but no crashes.
Comment 22•16 years ago
|
||
i have rebuilt balsa and it does not work at all. The options I can think of include running a dual boot on spider, which effectively makes automation impossible or finding new hardware.
Updated•16 years ago
|
Assignee: reed → nobody
Component: Server Operations → Release Engineering
QA Contact: mrz → release
Comment 24•16 years ago
|
||
And marking blocking 1.9.1+. We need to run this after all JS bugs are in, and before RC, and before each RC.
Comment 25•16 years ago
|
||
(In reply to comment #21)
> I ran through all of our test suites (mochitest, mochitest chrome, mochitest
> browser-chrome, mochitest a11y, reftest, crashtest, xpcshell tests) on spider.
> There were some test failures (that I didn't look into very deeply, but most
> look like the same kind of intermittent failures as on tinderbox), but no
> crashes.
We need to look at the test failures: only one failure mode (generating SSE2 code on a non-SSE2 machine, and calling it) will result in a SIGILL crash. We also need to know that the x87/non-SSE2 code that we generate is correct!
Assignee | ||
Comment 26•16 years ago
|
||
Ok, I'll collate and attach them to the bug in a bit.
Assignee | ||
Updated•16 years ago
|
Assignee: nobody → ted.mielczarek
Assignee | ||
Comment 27•16 years ago
|
||
First, the simple:
crashtest, mochitest-a11y, xpcshell: 0 failures
Assignee | ||
Comment 28•16 years ago
|
||
mochitest-chrome: 1 failure
Assignee | ||
Comment 29•16 years ago
|
||
mochitest-browser-chrome: 6 failures
2 of these are because bug 475383 hasn't landed on branch. The others may just be fallout from that failure, I didn't investigate fully.
Assignee | ||
Comment 30•16 years ago
|
||
reftest had a bunch of failures, but then I noticed that the first one was colordepth.html, and realized that my RDP connection was 16 bit, so those failures are probably all a result of that.
Assignee | ||
Comment 31•16 years ago
|
||
mochitest failures: 21
9 of these are known: 2 are from bug 475383 again, 7 are from the geolocation tests (bug 489817). I didn't investigate the rest.
Assignee | ||
Comment 32•16 years ago
|
||
I'll fire off another run today as well. (on the same build)
Comment 33•16 years ago
|
||
What happened with that run, please?
Comment 34•16 years ago
|
||
Do we have updated info here?
Assignee | ||
Comment 35•16 years ago
|
||
Sorry, lost track of this over the weekend. Summary from the second run:
mochitest-plain: somewhat different results, will attach log in a minute
mochitest-chrome: exact same result as previous run
mochitest-browser-chrome: one additional failure:
TEST-UNEXPECTED-FAIL | chrome://mochikit/content/browser/browser/components/preferences/tests/browser_privacypane_1.js | Timed out
Anything not mentioned still had zero failures.
Assignee | ||
Comment 36•16 years ago
|
||
Ok, there are 23 failures in this log, of which 9 are known (as before, the plugin tests and geolocation tests).
Comment 37•16 years ago
|
||
So there are 14 bugs to file, I guess. :-(
If we're seeing consistent fails on mochitest-chrome, doesn't that mean that they're probably not just the usual sometimes-orange randoms?
Assignee | ||
Comment 38•16 years ago
|
||
The mochitest-chrome failure was a known random that didn't get a fix backported to branch, bug 468189. (Although interestingly on this machine it sure seems repeatable!)
Assignee | ||
Comment 39•16 years ago
|
||
I think the browser-chrome failures are all fallout from the plugin test failing. It opens a tab, and then doesn't clean it up if it doesn't finish successfully. Should file a bug on making that test cleanup after itself better.
Assignee | ||
Comment 40•16 years ago
|
||
In the mochitest failures, I looked at:
31268 ERROR TEST-UNEXPECTED-FAIL | /tests/dom/tests/mochitest/ajax/offline/test_fallback.html | Fallback page displayed for top level document
I think this test is broken, it has a 3 second timeout internally:
http://mxr.mozilla.org/mozilla-central/source/dom/tests/mochitest/ajax/offline/test_fallback.html?force=1#71
This machine is *really* slow, so it wouldn't surprise me if we hit that.
Assignee | ||
Comment 41•16 years ago
|
||
(In reply to comment #39)
> I think the browser-chrome failures are all fallout from the plugin test
> failing. It opens a tab, and then doesn't clean it up if it doesn't finish
> successfully. Should file a bug on making that test cleanup after itself
> better.
I re-ran browser-chrome with the plugin test moved out of the way, and got just one failure:
TEST-UNEXPECTED-FAIL | chrome://mochikit/content/browser/browser/components/places/tests/perf/browser_ui_history_sidebar.js | Timed out
Suspiciously, this is in a "tests/perf" directory, and it looks like the test does a lot of work. The browser chrome harness has a 30 second timeout, so it seems likely that this test just can't finish in time.
Comment 42•16 years ago
|
||
How many times have we looped through these test runs so far?
Assignee | ||
Comment 43•16 years ago
|
||
Just two runs through the full test suite, on the same build (mentioned in comment 17). Happy to do more runs, or on a newer build, whatever floats your boat.
Comment 44•15 years ago
|
||
Ted, be ready to run these on notice. I'm guessing we'll want to run this before we ship the RC.
Assignee | ||
Comment 45•15 years ago
|
||
Will do, I was planning on grabbing a build from this morning and giving it another run.
Comment 46•15 years ago
|
||
Might be time to run this again?
Comment 47•15 years ago
|
||
Yeah, can use the b99 builds when they're out.
Assignee | ||
Comment 48•15 years ago
|
||
I re-ran this on a build from thursday(?) and got extremely similar results, although I didn't finish analysis. I think this box is currently MIA due to the office move, so hopefully someone can plug it back in on monday.
Comment 49•15 years ago
|
||
(In reply to comment #48)
> I think this box is currently MIA due to the
> office move, so hopefully someone can plug it back in on monday.
Both nonsse machines are AWOL. They didnt show up in new server room, or any of RelEng desks in new office. I already went back to Building K server lab this morning, and they are not there.
I'll go back and search a few other rooms in Building K later today.
Comment 50•15 years ago
|
||
One non-sse machine was in the server room but was off. It is now connected using a dhcp address of 10.250.6.227 but i am working on getting a dns hostname for it in bug 496946
This machine is a P3-500MHz with 384MB of ram. There is ssh working and I will email the username and password to ted
Comment 51•15 years ago
|
||
(In reply to comment #49)
> (In reply to comment #48)
> Both nonsse machines are AWOL. They didnt show up in new server room, or any of
> RelEng desks in new office. I already went back to Building K server lab this
> morning, and they are not there.
>
> I'll go back and search a few other rooms in Building K later today.
John Ford and myself went dumpster-diving in the old Building K and S. We found the nonsse machine, as well as a few other nonsse and ppc machines, and brought them all back to the new office.
We should have the pre-existing nonsse machine back online today sometime, and will find out how many of the other machines even work at all. Very happy with the additional nonsse and ppc machines found; quite a productive afternoon's scavenging!!
Assignee | ||
Comment 52•15 years ago
|
||
I've got a Mochitest run started on the Linux machine.
Comment 53•15 years ago
|
||
Looks like we're done with all blockers for RC. Need to run everything again?
Comment 54•15 years ago
|
||
(In reply to comment #51)
> (In reply to comment #49)
> > (In reply to comment #48)
[snip]
> We should have the pre-existing nonsse machine back online today sometime...
Forgot to update this bug earlier. jhford got the nonsse win32 machine up and running again Tues. DNS is still a bit unsettled in new office, but these IPs work:
linux: 10.250.6.227
win32: 10.250.5.20
Comment 55•15 years ago
|
||
(In reply to comment #54)
> DNS is still a bit unsettled in new office, but these IPs
> work:
>
> linux: 10.250.6.227
> win32: 10.250.5.20
Are there bugs on file to get these assigned static IPs?
Comment 56•15 years ago
|
||
there is one for goat, the linux one (bug 496946). The windows one had a working one before (spider) but i guess it was removed when it was moved to the junk pile. I can file a seperate bug or expand the linux one. either works for me.
Comment 57•15 years ago
|
||
Ted: can you run a set of unit tests on RC3 using these boxes so we can close this out?
Assignee | ||
Comment 58•15 years ago
|
||
I'm OOTO today, and traveling this weekend, so I can't get to it until Monday. If you want it sooner than that you'll have to find someone else, sorry.
Comment 59•15 years ago
|
||
Adding Joel Maher as he will be running the tests this afternoon.
Comment 60•15 years ago
|
||
I am seeing a LOT more errors when I did a run on linux/windows this weekend.
For example the linux mochitests have 331 failures (I ran twice to verify)! Also the linux browser-chrome tests did not finish (verified twice) as they were hung on sessionrestore tests!
# of failures
test linux windows
xpcshell 0 0
reftest 3 123
crashtest 0 0
mochitest 331 13
chrome 9 0
browser-chrome 20 10
a11y 0 0
Assignee | ||
Comment 61•15 years ago
|
||
I don't believe I ever ran the unittests on that Linux box, as it didn't exist when I started this testing.
The windows reftest results may be completely wrong, as you have to be careful to connect using 24-bit color with remote desktop. The mochitest/browser-chrome results look to be in line with what I saw, and were all harmless failures (tests relying on the test plugin, which is a known failure on branch packaged tests currently, or tests that are intermittent failures/timeout on slow hardware).
Comment 62•15 years ago
|
||
let me try the reftests again on windows. thanks for the data Ted.
Comment 63•15 years ago
|
||
This is the failure log after re-running browser-chrome tests after removing:
mochitest/browser/browser/base/content/test/browser_pluginnotification.js
cltbld@SPIDER /c/ff35_unittest/mochitest
$ grep UNEXPECTED-FAIL bchrome.log
TEST-UNEXPECTED-FAIL | chrome://mochikit/content/browser/browser/components/plac
es/tests/browser/browser_410196_paste_into_tags.js | Timed out
TEST-UNEXPECTED-FAIL | chrome://mochikit/content/browser/browser/components/plac
es/tests/perf/browser_ui_history_sidebar.js | Timed out
TEST-UNEXPECTED-FAIL | chrome://mochikit/content/browser/browser/components/pref
erences/tests/browser_privacypane_1.js | Timed out
TEST-UNEXPECTED-FAIL | chrome://mochikit/content/browser/browser/components/pref
erences/tests/browser_privacypane_2.js | Timed out
TEST-UNEXPECTED-FAIL | chrome://mochikit/content/browser/browser/components/pref
erences/tests/browser_privacypane_3.js | Timed out
TEST-UNEXPECTED-FAIL | chrome://mochikit/content/browser/browser/components/pref
erences/tests/browser_privacypane_4.js | Timed out
TEST-UNEXPECTED-FAIL | chrome://mochikit/content/browser/browser/components/pref
erences/tests/browser_privacypane_5.js | Timed out
TEST-UNEXPECTED-FAIL | chrome://mochikit/content/browser/browser/components/pref
erences/tests/browser_privacypane_6.js | Timed out
TEST-UNEXPECTED-FAIL | chrome://mochikit/content/browser/browser/components/pref
erences/tests/browser_privacypane_7.js | Timed out
cltbld@SPIDER /c/ff35_unittest/mochitest
$
Comment 64•15 years ago
|
||
Can we get an assessment here of whether or not we are good to go?
Comment 65•15 years ago
|
||
I'm pretty sure we're good, based on that log.
Assignee | ||
Comment 66•15 years ago
|
||
Yeah, those are just timeouts from tests that take too long because this machine is so godawful slow. If we're going to get automated builds on this machine, we should file a bug to track the test issues we'll need to resolve to get green tests on this machine, but I don't see anything that's an actual problem with running the builds here.
Status: ASSIGNED → RESOLVED
Closed: 16 years ago → 15 years ago
Resolution: --- → FIXED
Comment 67•15 years ago
|
||
if the work here is finished, could you please mark status1.9.1 accordingly? or fixed1.9.1 keyword at least. i'm querying bugzilla for unfinished 1.9.1 bugs and this is still marked as unfinished. or any other way to mark that we are done here, that i can query on.
Updated•15 years ago
|
Keywords: fixed1.9.1
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•