Closed Bug 697634 Opened 13 years ago Closed 12 years ago

xpcshell test failures on 10.7

Categories

(Release Engineering :: General, defect, P3)

x86_64
macOS
defect

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: rnewman, Assigned: jhford)

References

Details

(Whiteboard: [unittest])

Attachments

(1 file, 4 obsolete files)

Attached file Logs (obsolete) —
      No description provided.
Philipp's thoughts:

Thanks for the heads-up, John! FTR at least two of our engineers are running 10.7 and aren't seeing these test failures at all. So understandably, my suspicion lies with the machine setup for now.

Looking at the failures, I see that the test_resource.js, test_resource_async.js, test_restrequest.js tests fail because the request is returning with NS_ERROR_CONNECTION_REFUSED where NS_ERROR_NET_TIMEOUT is expected. This is weird because we definitely start up a server on port 8080 there. I assume the test_service_getStorageInfo failure is just a symptom of the same inexplicable cause.

One of the other two failures is in test_TelemetryPing.js which start up an nsHttpdServer on port 4444 and then makes requests to "http://localhost:4444".

Poking in the dark: Is the firewall blocking the server from listening perhaps? If so, does 'localhost' resolve properly on this machine?
Component: Firefox Sync: Backend → Release Engineering
Product: Mozilla Services → mozilla.org
QA Contact: sync-backend → release
Version: unspecified → other
The firewall isn't up, there is a loopback that has ip 127.0.0.1.  the name 'localhost' is resolved to 127.0.0.1 by dig, nslookup and host (which use different resolvers built into osx aiui)

User processes are able to listen on both port 4444 and 8080 successfully.

In one terminal I ran 'nc -b lo0 -l 4444' and in another I ran 'nc localhost 4444'.  Input on one terminal was displayed on the other.  I did the same for port 8080, suggesting that listening on ports 4444 and 8080 work.
worth noting is that this is only happening for opt builds, which is built on 10.6 instead of on 10.5 for debug.
Here are the logs with:

function run_test() {
  Service.username = "johndoe";
  Service.password = "ilovejane";
  Service.clusterURL = "http://localhost:8080/";

  Log4Moz.repository.getLogger("Sync.AsyncResource").level = Log4Moz.Level.Trace;
  Log4Moz.repository.getLogger("Sync.Resource").level = Log4Moz.Level.Trace;
  Log4Moz.repository.getLogger("Sync.Service").level = Log4Moz.Level.Trace;
  Log4Moz.repository.getLogger("Sync.StorageRequest").level = Log4Moz.Level.Trace;
  initTestLogging();

  run_next_test();
}
Attachment #569854 - Attachment is obsolete: true
Attached file corresponding http log (obsolete) —
I see 

==========================================
Got exception starting HTTP server on port 8080
Error: 2147746065 No traceback available
Is there a process already listening on port 8080?
==========================================

I wonder what's going on with that.  I am going to reboot the machine to make sure there are no listening sockets from my tests in comment 1
Attached file run after a clean reboot (obsolete) —
Corresponding http log

90132480[103c10dd0]: STS thread init
90132480[103c10dd0]: STS poll iter [1]
90132480[103c10dd0]:   calling PR_Poll [active=0 idle=0]
90132480[103c10dd0]:     timeout = -1 milliseconds
2003777888[10381d910]: nsHostResolver::Shutdown
2003777888[10381d910]: nsSocketTransportService::Shutdown
90132480[103c10dd0]:     ...returned after 71 milliseconds
90132480[103c10dd0]: STS shutting down thread
90132480[103c10dd0]: STS thread exit
Attachment #569878 - Attachment is obsolete: true
Attachment #569880 - Attachment is obsolete: true
(In reply to John Ford [:jhford] from comment #6)

> I wonder what's going on with that.  I am going to reboot the machine to
> make sure there are no listening sockets from my tests in comment 1

I am so astonishingly pleased that I added that code.

Try using lsof to see what's listening on that port? Or perhaps there's a test ordering issue?
The previous run was done over ssh, which I think caused an interaction with the test.

This log was run through Terminal.app.  I also netstat -a | grep LISTEN and there were no listening sockets left over on 8080 or 4444

http logs:
90132480[103836870]: STS thread init
90132480[103836870]: STS poll iter [1]
90132480[103836870]:   calling PR_Poll [active=0 idle=0]
90132480[103836870]:     timeout = -1 milliseconds
1971759456[10381db10]: nsHostResolver::Shutdown
1971759456[10381db10]: nsSocketTransportService::Shutdown
90132480[103836870]:     ...returned after 71 milliseconds
90132480[103836870]: STS shutting down thread
90132480[103836870]: STS thread exit
Attachment #569885 - Attachment is obsolete: true
So, some things we know:

1) opt builds created on 10.6 using xcode 3.2.? fail test on 10.7.2 test slave
2) debug builds created on 10.5 using xcode 3.? pass sync tests on 10.7.2 test slave
3) debug builds created on 10.7 by devs using xcode 4 (?) pass test on developer machine

I suggest that we package a developer's build who is running config 3) and run that on the 10.7 test machine.  packaging can be done by running "make package && make package-tests" in the objdir.

I also suggest that a developer who is running config 3) run the following commands:
curl -LO http://stage.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-macosx64/1319441324/firefox-10.0a1.en-US.mac.dmg
curl -LO http://stage.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-macosx64/1319441324/firefox-10.0a1.en-US.tests.zip
open firefox-10.0a1.en-US.mac.dmg
cp -r /Volumes/Nightly/Nightly.app .
unzip firefox-10.0a1.en-US.tests.zip
cp bin/xpcshell ./Nightly.app/Contents/MacOS
cp -R bin/components/* ./Nightly.app/Contents/MacOS/components/
cp -R bin/plugins/* ./Nightly.app/Contents/MacOS/plugins/
python -u xpcshell/runxpcshelltests.py  --manifest=xpcshell/tests/all-test-dirs.list ./Nightly.app/Contents/MacOS/xpcshell

to see if they can replicate the problem on their local 10.7 machine which is passing their own tests.
It looks to me like there are only two problems in the log you have (Attachment 569889 [details]):

* gczeal seems to have disappeared. Naturally the test that relies on this fails, but additionally everything dies. I suspect there's a build issue there.

* Precisely one HTTP-server-using test fails.

I'm not too afraid of these being insurmountable. Additional logs for the latter would be handy. The former is out of my jurisdiction, but I'll happily take a look.
(In reply to Richard Newman [:rnewman] from comment #11)
> It looks to me like there are only two problems in the log you have
> (Attachment 569889 [details]):
> 
> * gczeal seems to have disappeared. Naturally the test that relies on this
> fails, but additionally everything dies. I suspect there's a build issue
> there.

Oh, of course. gczeal only exists in DEBUG builds. You can ignore this.

That means the only remaining failure is

TEST-UNEXPECTED-FAIL | /Users/cltbld/sync/xpcshell/tests/services/sync/tests/unit/test_service_changePassword.js | false == true - See following stack:
JS frame :: /Users/cltbld/sync/xpcshell/head.js :: do_throw :: line 453
JS frame :: /Users/cltbld/sync/xpcshell/head.js :: _do_check_eq :: line 547
JS frame :: /Users/cltbld/sync/xpcshell/head.js :: do_check_eq :: line 568
JS frame :: /Users/cltbld/sync/xpcshell/head.js :: do_check_true :: line 582
JS frame :: /Users/cltbld/sync/xpcshell/tests/services/sync/tests/unit/test_service_changePassword.js :: run_test :: line 36
JS frame :: /Users/cltbld/sync/xpcshell/head.js :: _execute_test :: line 326
JS frame :: -e :: <TOP_LEVEL> :: line 1


You can add these lines to the top of run_test in that test file to get a more useful log:

  Cu.import("resource://services-sync/log4moz.js");

  Log4Moz.repository.getLogger("Sync.AsyncResource").level = Log4Moz.Level.Trace;
  Log4Moz.repository.getLogger("Sync.Resource").level = Log4Moz.Level.Trace;
  Log4Moz.repository.getLogger("Sync.Service").level = Log4Moz.Level.Trace;
Depends on: 697883
services-central now has this additional logging for test_service_changePassword. I'd be interested in an updated log, jhford!
Richard, mind pointing me at a build with those changes on ftp?  I'll send them through the loop.  I did add those lines you asked me to.

We will be putting these machines into production very soon, which will get us tests on checkin.  Is it ok for me to get these into production then use pushes to service-central to trigger the testing?
(In reply to John Ford [:jhford] from comment #14)
> Richard, mind pointing me at a build with those changes on ftp?

Today's builds:

https://ftp.mozilla.org/pub/mozilla.org/mobile/tinderbox-builds/services-central-android/1320051374/
https://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/services-central-macosx64/1320051374/
https://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/services-central-win32/1320051374/
https://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/services-central-linux/1320051374/

> I did add those lines you asked me to.

No longer necessary; I improved the test to log on its own.
 
> We will be putting these machines into production very soon, which will get
> us tests on checkin.  Is it ok for me to get these into production then use
> pushes to service-central to trigger the testing?

Sure.

Thanks for the help!
Assigning to jhford based on ongoing interaction already in the bug.
Assignee: nobody → jhford
Priority: -- → P3
Whiteboard: [unittest]
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → INCOMPLETE
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: