Closed Bug 892107 Opened 12 years ago Closed 12 years ago

Please loan a 10.8 mac tester to :smichaud

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

Product:

Component:

Type:

task

Priority:

Not set

Severity:

normal

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: hwine, Assigned: smichaud)

References

Details

(Whiteboard: [buildduty])

Reporter

Description

•

12 years ago

See bug 884471 comment 49 for context

Justin Wood (:Callek)

Comment 1

•

12 years ago

I'm going to loan a 10.8 instead of a 10.7 since the 10.8 are still a managed class, and :smichard doesn't care one way or another

Summary: Please loan a 10.7 mac tester to :smichaud → Please loan a 10.8 mac tester to :smichaud

Justin Wood (:Callek)

Updated

•

12 years ago

Depends on: 892146

Justin Wood (:Callek)

Comment 2

•

12 years ago

Host is loaned, please see your e-mail for password and VPN information. Please reassign this bug to "nobody@" when you are done with it, and releng will reclaim it.

Assignee: nobody → smichaud

Joel Maher ( :jmaher ) (UTC -8)

Comment 3

•

12 years ago

given a fresh box, you will want to download the firefox binary and tests package. I highly recommend looking in a log file from tbpl to get the exact steps. Here is a full log (m-c 10.8 debug, m1): https://tbpl.mozilla.org/php/getParsedLog.php?id=25165117&tree=Mozilla-Central&full=1 Inside the log, you can find the steps that are run by searching for 'Copy/paste'. You can find what is downloaded by searching for 'Downloading'. Please let me know if you are able to use the log to find the steps and get tests running. It is easy to put that in a loop, but we might want to do something smarter.

Steven Michaud [:smichaud] (Retired)

Assignee

Comment 4

•

12 years ago

I hope and assume that this *isn't* a fresh box. I understood it's a production machine, on which bug 884471 has already happened, that's been taken offline. But I do need to understand how the test machines work, and any help in that direction is appreciated. I currently have zero knowledge of this, so it will take me several days (at least) to learn it on my own.

Justin Wood (:Callek)

Comment 5

•

12 years ago

To be clear, this is not a "fresh box" it was a production machine running production jobs before I pulled it for you. However I did not verify that this *specific* box hit the issue, I did however verify that this *class* of boxes hit the issue. All machines in this "class" of machines are imaged/setup identically. And have identical hardware.

Steven Michaud [:smichaud] (Retired)

Assignee

Comment 6

•

12 years ago

OK, then. I think the first step should be to reproduce bug 884471 on my loaner, before I start trying to load my interpose library. I'd like to be able to run tests continuously until bug 884471 happens, then have them stop. The tests should, I suppose, be some subset of the mochitests in which the bug is known to happen -- e.g. mochitest-1, mochitest-other and so forth.

Steven Michaud [:smichaud] (Retired)

Assignee

Comment 7

•

12 years ago

By the way, I'm having trouble getting VPN to work properly from Mountain Lion (on my side), using Tunnelblick. Apparently you need to use a beta version of Tunnelblick to avoid it reconnecting every few minutes, and the beta uses a different settings format, and doesn't know how to convert from the old format (even though it claims to be able to). So I'll need to fix *that* problem before I can start work on the loaner :-(

Steven Michaud [:smichaud] (Retired)

Assignee

Comment 8

•

12 years ago

I've now fixed my VPN problems (at least to the extent of being able to ssh in to my loaner). Thanks, Callek, for telling me to use Viscosity, that the "Mozilla VPN" requires different settings, and where to download these settings from. (I still can't screen-share, but I hope that won't be necessary.) Now I need to learn the basics about how our build system works, starting from no knowledge whatsoever. Are there any docs on this (globally available, or only available via VPN)? What software does the build system use (apart from what's included in the Mozilla tree)? For each of these packages, where's the source, plus any docs that may exist? I looked at https://tbpl.mozilla.org/php/getParsedLog.php?id=25165117&tree=Mozilla-Central&full=1, and see from it that /builds/slave/talos-slave/test/ seems to be its working directory. I also see that my loaner already has this directory. What program produced this log? Is it feasible that I just run this program (whatever it is)? Please give me whatever information you can about it, including where to find its source and whatever docs may exist.

Justin Wood (:Callek)

Updated

•

12 years ago

Blocks: talos-mtnlion-r5-002

Steven Michaud [:smichaud] (Retired)

Assignee

Comment 9

•

12 years ago

I need answers to my questions in comment #8. Without them I can't begin work on my loaner, or on bug 884471. I don't even know who to ask, so I've picked three names more or less out of a hat. Answer what questions you can, and needinfo whoever you think might be able to help with the rest.

Flags: needinfo?(jmaher)

Flags: needinfo?(gps)

Flags: needinfo?(bhearsum)

bhearsum@mozilla.com (:bhearsum)

Updated

•

12 years ago

Flags: needinfo?(bhearsum)

Joel Maher ( :jmaher ) (UTC -8)

Comment 10

•

12 years ago

buildbot has a slave which runs on the client and executes the commands you see in the log file. That slave is the process which generates the log file which you end up seeing. 99.31% of the time you can reproduce failures by running those commands by hand!

Flags: needinfo?(jmaher)

Steven Michaud [:smichaud] (Retired)

Assignee

Comment 11

•

12 years ago

(In reply to comment #10) I need more information than that! Where is "buildbot"? Where is "slave"? How do I run run or both of them? Where's the source for them? Are there any docs, and if so where?

Steven Michaud [:smichaud] (Retired)

Assignee

Comment 12

•

12 years ago

So it looks like I'll need to find the information I need on my own. I Googled "buildbot site:mozilla.org" and found this: https://wiki.mozilla.org/Buildbot I'll keep digging. In a day or two I should be ready. Note that I'm on vacation next week (7-22 through 7-26), and won't be working then.

Joel Maher ( :jmaher ) (UTC -8)

Comment 13

•

12 years ago

I don't think you will be able to launch command from buildbot on there. Earlier on in the other bug you were going to download and run the tests on there, why are you talking about doing something else? All you need to do is cut/paste the commands that you see in the log and run them. If you need help hacking the harness to repeat until failure, I would be happy to help with that.

Steven Michaud [:smichaud] (Retired)

Assignee

Comment 14

•

12 years ago

As I've said multiple times here and in bug 884471, I need to run the tests *exactly* as they're run on the production test machines -- or as close to that as I can get. I strongly suspect that something in our build infrastructure is the cause of bug 884471 -- something that isn't run by "ordinary" builders.

Steven Michaud [:smichaud] (Retired)

Assignee

Updated

•

12 years ago

Flags: needinfo?(gps)

Joel Maher ( :jmaher ) (UTC -8)

Comment 15

•

12 years ago

It is your call, setting up a buildbot master and configuring it exactly as the ones that run the test will take a lot of your time. Why are you opposed to running the tests in the same environment (sans buildbot slave script) 100 times to look for the failure? If there is no failure, then it would seem worth the many man hours to do it via buildbot.

Steven Michaud [:smichaud] (Retired)

Assignee

Comment 16

•

12 years ago

> Why are you opposed to running the tests in the same environment > (sans buildbot slave script) 100 times to look for the failure? I'm not, in principle. Though I think it's probably a fool's errand -- if that was sufficient, why don't people see these failures when running tests locally? Is there an easy way to do this, that doesn't involve changing how my loaner is set up? If so, let me know and I'll try it.

Joel Maher ( :jmaher ) (UTC -8)

Comment 17

•

12 years ago

I have already told you how to do this, and 99%+ of the time when I run a test on a loaner box by hand the error reproduces. I have no more information to give you, this is becoming a circular argument. If you need help hacking the harness, I will be more than happy to assist. If you do have a question for me, please need-info! Good luck and have fun!

Steven Michaud [:smichaud] (Retired)

Assignee

Comment 18

•

12 years ago

> I have already told you how to do this Where?

Steven Michaud [:smichaud] (Retired)

Assignee

Comment 19

•

12 years ago

> Where? You mean here, in comment #3? I'll take a closer look, and see what I can glean from what you said.

Steven Michaud [:smichaud] (Retired)

Assignee

Comment 20

•

12 years ago

I think I've found a way to run desktop_unittest.py on my loaner (which is a slave) without any connection to a master (running buildbot): https://wiki.mozilla.org/ReleaseEngineering/Mozharness/07-May-2013?title=ReleaseEngineering/Mozharness Anyone willing to comment? Whether or not anyone is, I'll try it tomorrow and see what happens.

Steven Michaud [:smichaud] (Retired)

Assignee

Comment 21

•

12 years ago

(Following up comment #20) That strategy works, more or less. But the test run aborts before any individual tests are run, with the following error: INFO - _RegisterApplication(), FAILED TO establish the default connection to the WindowServer, _CGSDefaultConnection() is NULL. I take this to mean that I need to screen-share with my loaner, which I haven't yet been able to get working. (Up to this point I've been ssh-ing in to the machine.) Looks like I'll have to pick this up again the week after next, after I get back from vacation.

Chris Cooper [:coop] (he/him)

Updated

•

12 years ago

Component: Release Engineering: Machine Management → Release Engineering: Loan Requests

Steven Michaud [:smichaud] (Retired)

Assignee

Comment 22

•

12 years ago

Thanks to Kim Moir, I've now got screen sharing and desktop_unittest.py working on my loaner! Next I'll try to reproduce bug 884471 on it. This may take a while -- if only because I'll have to rerun tests so many times. But my path is no longer blocked, and I should eventually be able to try out my interpose library.

Nobody; OK to take it and work on it

Updated

•

12 years ago

Product: mozilla.org → Release Engineering

Steven Michaud [:smichaud] (Retired)

Assignee

Comment 23

•

12 years ago

I'm (basically) done with bug 884471. But I'm going to hang on to this loaner for a bit longer (a week or two) to work on bug 898519.

Justin Wood (:Callek)

Comment 24

•

12 years ago

(In reply to Steven Michaud from comment #23) > I'm (basically) done with bug 884471. But I'm going to hang on to this > loaner for a bit longer (a week or two) to work on bug 898519. 3 months later Steven, are you still using it for 898519? note if you're not but plan to I'd prefer to reclaim this to our pool and loan you a new one when you are once again able to devote time. But if you are actively using it, happy to leave it in your hands.

Flags: needinfo?(smichaud)

Steven Michaud [:smichaud] (Retired)

Assignee

Comment 25

•

12 years ago

(In reply to comment #24) Oops, I'd completely forgotten about this :-( Go ahead and reclaim this machine. In any case I'm probably not going to be working on bug 898519 anytime soon: I've got lots of other stuff to do, and bug 898519 is less urgent now that the tests that trigger it are disabled.

Flags: needinfo?(smichaud)

Justin Wood (:Callek)

Comment 26

•

12 years ago

reclaiming

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Updated

•

7 years ago

Component: Loan Requests → Buildduty

Product: Release Engineering → Infrastructure & Operations

Updated

•

6 years ago

Product: Infrastructure & Operations → Infrastructure & Operations Graveyard

You need to log in before you can comment on or make changes to this bug.