Closed
Bug 515457
Opened 15 years ago
Closed 15 years ago
Linux x64 JIT and non-JIT TraceMonkey tinderboxes
Categories
(Release Engineering :: General, defect, P3)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: dvander, Unassigned)
References
Details
(Whiteboard: [linux64])
We're aiming to have x64 TraceMonkey support at full parity with the x86 version. Would it be possible to get a TM tinderbox for Linux x64, and test coverage boxes for JIT pref'd on and off?
Updated•15 years ago
|
Blocks: support-L64
Updated•15 years ago
|
Component: Release Engineering → Release Engineering: Future
Updated•15 years ago
|
Component: Release Engineering: Future → Release Engineering
Comment 1•15 years ago
|
||
what are the criteria for marking a bug future? I don't think this is a :Future bug.
Comment 2•15 years ago
|
||
I'm not entirely sure what you're asking for, but it's at least the following:
* A pool of x86-64 linux slaves (which we don't have a reference platform for) - at least 5 or 6, probably more
* A pool of x86-64 talos slaves (which we don't have an image for) - I'm not sure how many
To provide this we need to spend time shaping up our Linux 64-bit slaves up to parity with the 32-bit ones. We also need to create a 64-bit Talos image from scratch, get minis, and deploy it.
It's not a trivial amount of work. I'm not in a position to judge priority of this over the over things on our plate. IMHO we should be talking about this in terms relative to other things that could be Q4 goals - this isn't getting done in Q3 unless many existing things are dropped. John is away this week, but Justin might have a better idea about this.
Comment 3•15 years ago
|
||
This should be a future bug until:
- the tests are passing reliably in both configurations
- we know exactly what configuration we want set up
- we can say what its impact is on which product releases, so that it can be triaged against other work that affects product releases
This stuff should go to releng when development has ensured that it works, and we just need it added to production automation, not in order to find out if it will work. If we need an x64 VM for someone to do the initial verification on, we can get one.
Comment 4•15 years ago
|
||
agreed with comment #3 - everyone ok with closing this incomplete until the steps in there are completed? infra can help getting test machines or vm's up for development if needed...
Comment 5•15 years ago
|
||
(In reply to comment #2)
> I'm not entirely sure what you're asking for, but it's at least the following:
> * A pool of x86-64 linux slaves (which we don't have a reference platform for)
I assumed we had a reference platform because we build something called "Linux x86-64 mozilla-central nightly"
(In reply to comment #3)
> This should be a future bug until:
> - the tests are passing reliably in both configurations
This bug got filed in anticipation of doing that. Only way to find out for sure is running the tests on continuous integration infrastructure.
> - we can say what its impact is on which product releases, so that it can be
> triaged against other work that affects product releases
Some platform work has to happen ahead of product release decisions. We know we need this, right?
Comment 6•15 years ago
|
||
We spent a significant amount of work getting or 64-bit JIT backend to work. If we don't have tinderbox test coverage, it will regress. ARM serves as clear example for that. If we want a working 64-bit JIT backend either now, or in the near future (as in next year or so), we will need automated tests to make sure it keeps running from this point forward. I can't believe its so incredibly hard to setup some sort of tinderbox testing here. If a full browser build is difficult or doesn't pass our internal tests, we could start with running trace-tests and maybe build a browser and start it and shut it down as minimal regression test. If VMs are hard to setup, lets buy a $500 machine, install Ubuntu and let it run the column.
Comment 7•15 years ago
|
||
Clarification for #1: we want the backend to be at full parity. _Not_ the tests (not necessarily, at least not right away).
Comment 8•15 years ago
|
||
We have 64-bit development machines for macosx. I think we can run a VM with Linux on that to run trace tests and builds. Could we mirror the following machine onto the TM tree to build there as well:
Linux x86-64 mozilla-central build
It doesn't run tests at all, but at least we won't accidentally regress the builds we check on the m-c tree for 64-bit.
Comment 9•15 years ago
|
||
I have to agree with gal here. It's one thing to say "we don't have the staffing and/or hardware", but don't make up procedural roadblocks. Particularly not "all tests already passing", that's nonsense. We shipped 3.5 without *any* of the trace-tests active on a tinderbox; shall we just XFAIL the whole suite out and then re-enable them as we go? I'm willing, if that's all it takes.
The amount of push-back and delay we get on "add something to tinderbox so we can even keep that platform working at all" is inordinate. This is a multi-platform JIT. Every change is likely to break something unless given full-coverage testing.
Comment 10•15 years ago
|
||
shaver just mentioned we might get 64-bit builds and supporting testing
infrastructure for MacOSX. We could use that to exercise the 64-bit backend.
Ideally 64-bit linux in addition would be good, but much lower priority if
there is at least some sort of 64-bit x86 builds running on the waterfall.
Comment 11•15 years ago
|
||
Andreas: it's not that it's "incredibly hard", just like it's not "incredibly hard" for developers to run tests on a 64-bit machine (or an ARM machine) before checking in. But, like with busy developers, releng has a lot of stuff on their plate, and asking them to bring up a new platform "from this point forward" so that we can support the platform "in the next year or so" is not productive, nor respectful.
I want all tests passing before we hand it to releng so that we can distinguish "problems with the production config" from "problems with the code"; the mobile experience here, where people are chasing phantom problems due to test suites that are not reliably green, and poking releng about diagnostics and logs that are hard to self-service, is exactly what I want to avoid.
We don't have Linux x64 infrastructure, other than one build host (which I actually thought was a reference image, but I can't tell from the log). Getting it will require a) scheduling that work into releng's timeline, which currently includes a ton of mobile stuff, or b) some nearer-term hole-plugging work from the 64-bit VM people.
It would, though, be helpful if instead of "you might be asking for the world, so => future" the response were to be "we can do builds now, doing tests requires an image that we can clone, and doing talos requires another image we can clone -- could you help get a machine up on an x86-64 VM with the instructions at http://wiki.mozilla.org/Build/BuildSlaveConfig/Linux and then we can clone it a couple of times in the interim? that'll at least give you test coverage, if not perf".
It would also be helpful if engineers were specific about what they're asking for, and on what timeline. it will be less likely to trip a world-is-collapsing-into-a-black-hole reaction, and let us more usefully and pleasantly figure out what the right time vs engineer work vs releng plate-shuffling balance is.
Comment 12•15 years ago
|
||
(In reply to comment #11)
>
> It would also be helpful if engineers were specific about what they're asking
> for, and on what timeline.
I hear the the x64 backend is "almost ready". So, let's say three weeks for a timeline, using the same reference Linux as x86-32, except compiled for 64-bit, with OS X Snow Leopard 64-bit following 2 weeks later. Is more information needed?
Comment 13•15 years ago
|
||
(In reply to comment #0)
> We're aiming to have x64 TraceMonkey support at full parity with the x86
> version. Would it be possible to get a TM tinderbox for Linux x64, and test
> coverage boxes for JIT pref'd on and off?
Enabling builds on TM shouldn't be a problem; but as has been mentioned above, there is currently only 1 x64 linux VM running, so wait times for builds will be pretty terrible until more VMs can be added.
What tests are you wanting to run? Unittests and / or Talos? And how will the JIT be pref'd on and off?
For unittests, we would run them on the same VM as the builds run on. This would make wait times for builds even worse since now the single VM is doing tests and builds on two branches. Also, we've never run unittests on our x64 linux VM, so time will have to be spent determining if test failures are due to something in the way the VM is set up, or reflect real code problems.
We have no x64 linux image for the Talos slaves, so there's no way to get Talos tests up and running any time soon.
We also have no quick way of bringing up more x64 linux VMs at the moment.
So, until all of the above can be figured out, and until somebody from releng can find time to work on this, this bug belongs in our Future queue.
Component: Release Engineering → Release Engineering: Future
Reporter | ||
Comment 14•15 years ago
|
||
Sorry for the confusion here. If we plan to ship x64 at some point, we'll need the full infrastructure. But indeed for now all I'm looking for is some way to automate testing+builds on our own checkins. Without _something_ the platform won't be taken seriously.
I would be glad to set up a VM myself if I can get a copy of VMware or whatever is needed, especially if it'll make things easier for releng when the time comes.
Is it possible to hook up a local office machine to the TM tinderbox? We have a machine for running mochitests and it's got plenty of spare cycles.
Shaver: the link in comment #11 doesn't work for me - is there a working doc link?
Comment 15•15 years ago
|
||
(In reply to comment #14)
> Sorry for the confusion here. If we plan to ship x64 at some point, we'll need
> the full infrastructure. But indeed for now all I'm looking for is some way to
> automate testing+builds on our own checkins. Without _something_ the platform
> won't be taken seriously.
>
> I would be glad to set up a VM myself if I can get a copy of VMware or whatever
> is needed, especially if it'll make things easier for releng when the time
> comes.
>
> Is it possible to hook up a local office machine to the TM tinderbox? We have a
> machine for running mochitests and it's got plenty of spare cycles.
Yes, it's possible to do this, although I don't know the specifics. I believe the static analysis box is setup exactly this way.
>
> Shaver: the link in comment #11 doesn't work for me - is there a working doc
> link?
Our 32-bit reference platform docs are here:
https://wiki.mozilla.org/ReferencePlatforms/Linux-CentOS-5.0
The 64-bit docs, such as they are, are here:
https://wiki.mozilla.org/ReferencePlatforms/Linux-CentOS-5.0_64-bit
To get to the point where we're officially supporting 64-bit linux, we'd need to get the VM to a state where it's got similar tools as the 32-bit VM.
Comment 16•15 years ago
|
||
(In reply to comment #15)
> (In reply to comment #14)
> > Is it possible to hook up a local office machine to the TM tinderbox? We have a
> > machine for running mochitests and it's got plenty of spare cycles.
>
> Yes, it's possible to do this, although I don't know the specifics. I believe
> the static analysis box is setup exactly this way.
It's basically sending 2 emails per build to tinderbox-daemon@tinderbox.m.o.
They begin with a header like this:
http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/webtools/tinderbox/README&rev=1.15#118
The first is a "starting" email, with the tinderbox tree and builder name, and the second is a status email with the exit status (success/test_failed/failed?) and full log.
I hear tales of a tinderbox client script harness, but I've always rolled my own, so I don't know the specifics about that.
Comment 17•15 years ago
|
||
(In reply to comment #5)
> (In reply to comment #2)
> > I'm not entirely sure what you're asking for, but it's at least the following:
> > * A pool of x86-64 linux slaves (which we don't have a reference platform for)
>
> I assumed we had a reference platform because we build something called "Linux
> x86-64 mozilla-central nightly"
>
To expand on what Catlee said, we have one linux 64-bit VM per master (total of 2). They were never standardized to a ref platform, nor kept in line with the 32-bit Linux VMs because they were originally setup for bug 359336, for the purpose of catching very specific problems - explicitly not with the intention of 32-bit parity.
Comment 18•15 years ago
|
||
Justin: how hard is it to clone the existing x86-64 VM, to alleviate the load concerns catlee raised for the single VM in play? I assert that even on a split load (especially with something like ccache to take advantage of the fact that tm and m-c are often only slightly diverged, but I digress!) the wait for builds will be less than several weeks, which is what we're otherwise facing. I further assert that anyone who cares about those 64-bit builds _really_ wants the JIT to work, so they won't mind splitting the cycles with the TM tree to make sure it stays in good shape.
Related: can we get that VM cloned multiple times? It seems from those 64-bit docs that once 'things that are done for you' is complete, it's less than 30 mins of work to get the rest of the stuff installed, but the 32-bit docs are longer, and I'm not sure how to construct the resulting merge in my head. I'm willing to live without the scratchbox or hildon stuff on the TM 64-bit systems, and even with the noisy esd/ALSA stuff. I don't know from the puppet stuff, and hope that we can live without Java too! (Do we still need that post-OJIectomy? I hope not, but fear that some xulrunner config will bone us. To the mozconfig!)
I think that we're willing to trust that the 32 -> 64 bit transition on a system like CentOS will have minimal strange outlying effects, relative to our own code. Certainly much less than going to a 64-bit OS X or Windows, since the CentOS/RHEL 64-bit products have been in deployment for a lot of people for a long time. The TM crew would be willing guinea pigs for raising a 64-bit reference image in situ, I am fully confident, so if we can get there piecemeal as we enable new tests, it seems like everyone wins.
First step seem like it's to get that existing single VM set up to watch the tm branch for builds too, and then we can do the cloning/building dance for the unit test parts. Even just running trace-tests would be a big step forward. Should I file narrower bugs for that? I looked for the bug that resulted in moz2-linux64-slave01, but couldn't find it -- might be behind the infra-confidential checkbox for some reason, or I might just be failing at bugzilla.
Comment 19•15 years ago
|
||
(Ben found the bug; I failed at bugzilla. :-( )
Comment 20•15 years ago
|
||
(In reply to comment #18)
> I assert that even on a
> split load (especially with something like ccache to take advantage of the fact
> that tm and m-c are often only slightly diverged, but I digress!)
I don't think we're in a place to just flip on ccache - Catlee did some work there, so he would have a better idea. In any case, 64-bit dep builds are pretty zippy without it.
> Related: can we get that VM cloned multiple times? It seems from those 64-bit
> docs that once 'things that are done for you' is complete, it's less than 30
> mins of work to get the rest of the stuff installed, but the 32-bit docs are
> longer, and I'm not sure how to construct the resulting merge in my head. I'm
> willing to live without the scratchbox or hildon stuff on the TM 64-bit
> systems, and even with the noisy esd/ALSA stuff. I don't know from the puppet
> stuff, and hope that we can live without Java too! (Do we still need that
> post-OJIectomy? I hope not, but fear that some xulrunner config will bone us.
> To the mozconfig!)
Going without scratchbox seems fine. Getting Puppet installed is a bigger task, though. I think we'll need to recreate the /tools packages - not the biggest deal, and I'm sure they'll be other things that need tweaking. Proper testing and fixing will take a non-trivial amount of time (days, if not > 1 week)
We _could_ live without Puppet to start with, but that makes it harder to deploy new things, and will make us and other sad when such things are required.
> The TM crew would be willing guinea pigs for raising a 64-bit
> reference image in situ, I am fully confident, so if we can get there piecemeal
> as we enable new tests, it seems like everyone wins.
>
> First step seem like it's to get that existing single VM set up to watch the tm
> branch for builds too.
That part is pretty easy, and I think we can enable that without more VMs. The 64-bit dep builds are quite quick. If we're looking at enabling unit tests, leak tests, or anything else past that we need more VMs. We should ping IT/Phong about how much space we have for new VMs - I don't know where we're at these days.
Comment 21•15 years ago
|
||
Boy, will I ever be pinging IT about that in the morning. :-)
Thanks! Do you want another bug to track the multi-branchification of moz2-linux64-slave01? Mostly due to me, this bug is now a little noisy. :-/
Comment 22•15 years ago
|
||
(In reply to comment #21)
> Thanks! Do you want another bug to track the multi-branchification of
> moz2-linux64-slave01? Mostly due to me, this bug is now a little noisy. :-/
I filed bug 515612 for this.
Comment 23•15 years ago
|
||
And I filed bug 515616 to get a 64-bit VM cloned for TraceMonkey devs to use.
Comment 24•15 years ago
|
||
(In reply to comment #18)
> I further assert that anyone who cares about those 64-bit builds _really_ wants
> the JIT to work, so they won't mind splitting the cycles with the TM tree to
> make sure it stays in good shape.
Indeed. I should possibly clarify why this is such a crucial transition point, where any little bit will help:
The JIT actually has per-CPU *files*, not just "same code, different machine", they don't get read or compiled at all unless you're compiling for a particular CPU. And those files are coupled, interface-wise, to the rest of the JIT. So those files tend to bitrot immediately upon falling into the category of "not being built every hour". We're very close to making the transition to "being built every hour" on x64.
When we *do* that, and get it over the hump of "some vaguely-reasonable number of the JIT tests are passing" (as dvander assures us we are) then it becomes non-stupid for most if not all of the tracemonkey group (who are, I believe, all working on either snow leopard or ubuntu, and all on 64bit *capable* hardware and OSs) to switch our day-to-day workspaces to x64 mode, without completely crippling our ability to investigate JIT bugs. When that happens, the substantial remaining bugs will likely get fixed in a hurry and stay fixed. The 32bit x86 tinderboxes will become the "legacy" machines.
So it's very much a "snowball-forming moment" we're talking about.
Updated•15 years ago
|
OS: Mac OS X → Linux
Updated•15 years ago
|
Whiteboard: [linux64]
Comment 25•15 years ago
|
||
Mass move of bugs from Release Engineering:Future -> Release Engineering. See
http://coop.deadsquid.com/2010/02/kiss-the-future-goodbye/ for more details.
Component: Release Engineering: Future → Release Engineering
Priority: -- → P3
Comment 26•15 years ago
|
||
So, unittest and talos coverage is for Linux 64 is coming Real Soon Now. 64-bit builds have been on for tracemonkey for a while now.
Which tests do you want run with the JIT preffed on/off?
Comment 27•15 years ago
|
||
(In reply to comment #26)
> So, unittest and talos coverage is for Linux 64 is coming Real Soon Now.
> 64-bit builds have been on for tracemonkey for a while now.
Builds, Unittest and Talos are live for linux64. This work is done, so closing.
> Which tests do you want run with the JIT preffed on/off?
If you do want testsuites run both with JIT on / JIT, please file new "create new test" bugs, listing specific suitenames.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Assignee | ||
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•