Closed Bug 485436 Opened 12 years ago Closed 12 years ago

[SeaMonkey, MacOSX] All Mochitest suites fails now: "Timed out while waiting for server startup.", related to 'xrePath'

Categories

(Core :: General, defect)

x86
macOS
defect
Not set
major

Tracking

()

RESOLVED FIXED
mozilla1.9.2a1

People

(Reporter: sgautherie, Assigned: ted)

References

Details

(Keywords: fixed1.9.1, Whiteboard: [fixed1.9.1b4])

Attachments

(1 file, 1 obsolete file)

1st failing build was:
{
http://tinderbox.mozilla.org/showlog.cgi?log=SeaMonkey/1238012262.1238017547.17244.gz
MacOSX 10.4 comm-central dep unit test on 2009/03/25 13:17:42

mochitest test complete warnings
...
Server pid: 29966
Timed out while waiting for server startup.
program finished with exit code 1

mochichrome test complete warnings
...
Server pid: 29969
Timed out while waiting for server startup.
program finished with exit code 1

browserchrome test complete warnings
...
Server pid: 29972
Timed out while waiting for server startup.
program finished with exit code 1
}

Regression timeframe is:
http://hg.mozilla.org/releases/mozilla-1.9.1/pushloghtml?startdate=2009-03-25+10%3A09%3A44&enddate=2009-03-25+13%3A11%3A10

Unlikely to be bug 480956,
which leaves bug 470971 only.
Flags: wanted1.9.1?
Callek, KaiRo already gave a first look , and did a clobber, but no improvements.
Would you have time to give a second look to try and find what is wrong ?
(while KaiRo is away till monday.)
Last I had tried, I was unable to vnc into the mac box, (and its been a while since I tried), but If I can find time before KaiRo gets back yes I'll try to.
(In reply to comment #0)
> which leaves bug 470971 only.

Though I misread this: it's xpcshell not mochitest, so pretty unlikely too...
(In reply to comment #1)
> Would you have time to give a second look to try and find what is wrong ?

The problem is that other than rebooting the machine, I have no clue what could be done, to me this doesn't look like a machine issue but more like a tests/code issue, possibly due to this machine being Tiger/10.4 and Firefox testers being Leopard/10.5 machines.
(In reply to comment #4)

Fwiw, Callek did run the test manually and saw/found nothing more than what is in the log.
Rebuilding (before rebooting) with revisions that previously worked should give us a hint whether it's a code or machine issue, shouldn't it ?

NB: (unrelated, but not filing another bug)
The Windows box is now timing out on json TUnit tests ... with no new code revisions :-(
Callek tried to clobber it, which made no difference.
Unless you see something specific, that box may actually need a reboot ?
Just rebuilding doesn't work that easily, and is quite some work.
I'm pretty sure this is actually a code or test problem.
On Windows, this is not just a timeout, it's a real crash.

And it's not that easy to just get access, for one, an account for the jump server would need to be created, for the other, I'm not just handing out the password to all our machines that easily - there's no read-only access possible, it's either full or none.
(In reply to comment #3)
> (In reply to comment #0)
> > which leaves bug 470971 only.
> 
> Though I misread this: it's xpcshell not mochitest, so pretty unlikely too...

Mochitest is running on top of, or with, xpcshell in some fashion (see, for instance, bug 432189), so 470971 probably is really your culprit.
Hm. You must be being bitten by something similar to what we fixed with:
http://hg.mozilla.org/mozilla-central/rev/a563bc0ec80a

If the xpcshell binary is a symlink, it will fail now with the other patch from bug 470971 landed.
Smokey, right, we're failing to start up the local HTTP server that delivers the mochitests, and that one is actually running via xpcshell.

Ted, sounds very much possible that there is some connection, thanks. But shouldn't that fix apply to us as well? Or didn't it land on 1.9.1?
It did land on 1.9.1, as a bustage fix because I didn't realize I needed to land it there originally. It's listed in the regression window pushlog query in comment 0.
KaiRo, did you actually check the symlink issue the first time we talked and you did a clobber ?
(In reply to comment #10)
> It did land on 1.9.1, as a bustage fix because I didn't realize I needed to
> land it there originally. It's listed in the regression window pushlog query in
> comment 0.

Good, that closes out what it fixed as a possible reason.

(In reply to comment #11)
> KaiRo, did you actually check the symlink issue the first time we talked and
> you did a clobber ?

No, but it probably doesn't matter really, as the failure is still there after clobbering.
(In reply to comment #5)
> NB: (unrelated, but not filing another bug)
> The Windows box is now timing out on json TUnit tests ... with no new code
> revisions :-(
> Callek tried to clobber it, which made no difference.
> Unless you see something specific, that box may actually need a reboot ?

Reboot had not worked either.

This went from red to orange (= no more ending timeout) with
http://hg.mozilla.org/comm-central/pushloghtml?startdate=2009-03-30+22%3A16%3A51&enddate=2009-03-31+02%3A00%3A50

1)
That's odd: it (still) had
{
======== BuildStep started ========
update
[...]

make[1]: *** [check] Error 1
make: *** [check] Error 2

-(Unnoticed before...)-> WindowsError: [Error 13] The process cannot access the file because it is being used by another process: 'c:\\docume~1\\seabld\\locals~1\\temp\\1\\runxpcshelltests_leaks.log'

make[2]: *** [check] Error 1
make[1]: *** [check] Error 2
TEST-UNEXPECTED-FAIL | d:\builds\slave\comm-central-win32\build\objdir\mozilla\_tests\xpcshell\json_test\unit\test_reviver.js | test failed, see following log:
make[5]: *** [check] Error 1
make[4]: *** [check] Error 2
make[3]: *** [check] Error 2
make[2]: *** [check] Error 2

[...]
program finished with exit code 1
elapsedTime=2866.359000
pulling from http://hg.mozilla.org/comm-central
}
and yet "TUnit<br/>592/0"

Hum, it looks like some extraneous log "leaked" into the new/current one or something :-/
That might be consistent with the fact it was taking 2 reds to get the (first) "test_reviver.js" error reported !?

2)
Next build is "fine".
Ftr, would someone know which rev actually fixed this Windows issue ?
I talked Kairo through some troubleshooting, and we basically narrowed it down to this not working:
dist/bin/xpcshell -g dist/SeaMonkey.app/Contents/MacOS ...

without the -g argument, or with -g dist/bin, it works. I think we can work around this in the Mochitest harness.
Attached patch patch for testing (obsolete) — Splinter Review
Kairo, if you could test this (or if someone else could test this on 10.4) I would appreciate it.
Comment on attachment 370234 [details] [diff] [review]
patch for testing

>+    if options.app != parser.get_option("--appname").default:
>+      options.xrePath = os.path.dirname(options.app)
>+    else:
>+      # otherwise default to dist/bin
>+      options.xrePath = automation.DIST_BIN

Nit: could use '=='...
(In reply to comment #15)
> Created an attachment (id=370234) [details]
> patch for testing
> 
> Kairo, if you could test this (or if someone else could test this on 10.4) I
> would appreciate it.

The just started "MacOSX 10.4 comm-central dep unit test" cycle on the SeaMonkey tinderbox waterfall is using this patch, let's see if it ends up actually executing mochitests.
(In reply to comment #17)
> (In reply to comment #15)
> > Created an attachment (id=370234) [details] [details]
> > patch for testing
> > 
> > Kairo, if you could test this (or if someone else could test this on 10.4) I
> > would appreciate it.
> 
> The just started "MacOSX 10.4 comm-central dep unit test" cycle on the
> SeaMonkey tinderbox waterfall is using this patch, let's see if it ends up
> actually executing mochitests.

Does not seem to work, still a timeout.
(In reply to comment #18)
> Does not seem to work, still a timeout.

OK, I think I know why that is, we are not using the make-driven steps yet and therefore calling runtests.py directly and providing a --appname there.
After consulting with Kairo, take 2.
Assignee: nobody → ted.mielczarek
Attachment #370234 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Comment on attachment 370433 [details] [diff] [review]
patch for testing, take 2

r=me in terms of this solving the problem
Attachment #370433 - Flags: review+
Comment on attachment 370433 [details] [diff] [review]
patch for testing, take 2

Waldo: this fixes Mochitest in the default configuration on 10.4 by working around a bug that we don't really understand, but probably requires a lot of debugging. I don't think this is a hack, it just changes the way we get the default value of xrePath.
Attachment #370433 - Flags: review?(jwalden+bmo)
Attachment #370433 - Flags: review?(jwalden+bmo) → review+
Pushed to m-c:
http://hg.mozilla.org/mozilla-central/rev/606acfb74fb5

I'll get it on 1.9.1 after a cycle on trunk.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Flags: wanted1.9.1? → in-testsuite-
Whiteboard: [fixed1.9.1b4]
Target Milestone: --- → mozilla1.9.2a1
Summary: [SeaMonkey, MacOSX] All Mochitest suites fails now: |Timed out while waiting for server startup.| → [SeaMonkey, MacOSX] All Mochitest suites fails now: "Timed out while waiting for server startup.", related to 'xrePath'
You need to log in before you can comment on or make changes to this bug.