Open Bug 561350 Opened 14 years ago Updated 2 years ago

test_punycodeURIs.js fails (was timing out) on Mac xpcshell test

Categories

(Firefox :: File Handling, defect, P4)

All
macOS
defect

Tracking

()

People

(Reporter: lsblakk, Unassigned)

References

Details

(Keywords: intermittent-failure, Whiteboard: [10.6][puppet][buildslaves][test disabled by bug 599475])

Attachments

(5 files, 2 obsolete files)

currently running 10.6 unittests in production, the xpcshell tests get to:

TEST-PASS | /Users/cltbld/talos-slave/mozilla-central-snowleopard-opt-u-xpcshell/build/xpcshell/tests/test_uriloader_exthandler/unit/test_handlerService.js | test passed

and then consistently time out before/while running test_punycodeURIs.js

possibly related to bug 456606?
this is also consistently occurring on the debug xpcshell test runs for 10.6
Mark, I don't suppose you have access to a 10.6 machine? If not, I just recently updated to 10.6, so I'm happy to debug-by-proxy for you once I dig out from my backlog.
Blocks: 378991
Ok, I never did fix the possibility for a hang in the failure case for the test. gCheckExistsAttempts never gets incremented, so it just loops. This patch will fix that.

What that does mean is that either WriteArgument isn't getting run (or can't be for some reason) and hence not producing the expected output file, or the output file is going to the wrong place (unlikely, since that's controlled by an env var from the test).

I can take a look, although, I haven't got much spare time. I'm sure someone with a 10.6 box could initially try something like:

export WRITE_ARGUMENT_FILE=test.txt WriteArgument http://url.test/

and see if the output file gets created or not.

If it does get created, then it points to an issue with the app running the executable on 10.6 (e.g. there's some DYLD_LIBRARY_PATH hacks in the test).

If it doesn't get created, then the issue is WriteArgument, but I expect that is unlikely from the simplicity of that code...
Attachment #441490 - Flags: review?
Attachment #441490 - Flags: review? → review?(ted.mielczarek)
Comment on attachment 441490 [details] [diff] [review]
[checked in] Fix the hang, probably not the test

FWIW, feel free to land simple test fixes like this without review.
Attachment #441490 - Flags: review?(ted.mielczarek) → review+
Comment on attachment 441490 [details] [diff] [review]
[checked in] Fix the hang, probably not the test

Checked in: http://hg.mozilla.org/mozilla-central/rev/014349e11696

Where can the results be viewed?

I expect the tests will still fail but they at least shouldn't hang now.
Attachment #441490 - Attachment description: Fix the hang, probably not the test → [checked in] Fix the hang, probably not the test
The 'Rev3 MacOSX Snow Leopard 10.6.2 mozilla-central debug test xpcshell' column of http://tinderbox.mozilla.org/showbuilds.cgi?tree=Firefox&noignore=1
Thanks, I always get that parameter wrong.

So now the test just fails (as expected) and the rest of the tests pass. The test failure is:

TEST-UNEXPECTED-FAIL | /Users/cltbld/talos-slave/mozilla-central-snowleopard-debug-u-xpcshell/build/xpcshell/tests/test_uriloader_exthandler/unit/test_punycodeURIs.js | Expected File /Users/cltbld/talos-slave/mozilla-central-snowleopard-debug-u-xpcshell/build/xpcshell/tests/test_uriloader_exthandler/unit/result.txt does not exist after 30 seconds

So now the way to resolve what's happening is for someone with access to a 10.6 box to try out my suggestions in comment 3.
Summary: test_punycodeURIs.js is timing out on 10.6 opt xpcshell test → test_punycodeURIs.js fails (was timing out) on 10.6 opt xpcshell test
(In reply to comment #3)
> I can take a look, although, I haven't got much spare time. I'm sure someone
> with a 10.6 box could initially try something like:
> 
> export WRITE_ARGUMENT_FILE=test.txt WriteArgument http://url.test/
> 
> and see if the output file gets created or not.

smontagu says "it does"
(In reply to comment #8)
> (In reply to comment #3)
> > I can take a look, although, I haven't got much spare time. I'm sure someone
> > with a 10.6 box could initially try something like:
> > 
> > export WRITE_ARGUMENT_FILE=test.txt WriteArgument http://url.test/
> > 
> > and see if the output file gets created or not.
> 
> smontagu says "it does"

Ok, so either the test isn't calling WriteArgument (probable) or the test file is ending up in the wrong place (unlikely).

In any case, I think this needs someone with a 10.6 system to sit down and go through the test debugging it.
I ran this a zillion times in a row (both in packaged and non-packaged configurations) and I can't get it to fail on my MBP.
(In reply to comment #10)
> I ran this a zillion times in a row (both in packaged and non-packaged
> configurations) and I can't get it to fail on my MBP.

My suspicion therefore would be some system configuration issue.
This also seems to hang on the new Win7 boxes (after having fixed bug 562967 so that this runs now)...

TEST-INFO | c:\talos-slave\mozilla-central-win7-opt-u-xpcshell\build\xpcshell\tests\test_uriloader_exthandler\unit\test_punycodeURIs.js | running test ...

command timed out: 1200 seconds without output

One thought from looking at the test is that |localHandler| will go out out of scope and can be GC'd while waiting for the external app to finish. Though that might be ok, looks like nsIProcess adds itself as an xpcom-shutdown observer, so that bit will hang around even though other bits may not.

Of course, the test works just fine on my Win7 box. Sigh.
armen, could we make a machine available for standard8 to use to diagnose the failure?
Standard8 ping me on IRC or file a bug like bug 578372 so I can get you a machine for you to work on.
I need to know that you (or whoever works on it) is ready to jump on and start working on it.
This is what is going on. Check out the screen shot.
The right attachment.
Attachment #459879 - Attachment is obsolete: true
Awesome, no wonder I couldn't reproduce on my local builds.

It looks like you can disable that dialog with some hackery:
http://www.davinian.com/os-x-leopard-are-you-sure-you-want-to-open-it/
http://www.macosxhints.com/article.php?story=2009081808315511
Ok, so the Windows 7 issue is a separate issue from this bug, and is actually the same PGORT80.DLL issue as the test failure in bug 578448 (to be fixed by bug 542504).

So on OS X, why does this failure not occur on the existing OS X boxes (32 bit, 10.5)? I thought warning was added with 10.5, do we already have a workaround on the existing boxes? What's different?

[I notice that one of Teds's links says VerifiedDownloadPlugin could be involved, but we have code that should be explicitly blocking this plugin from the browser (mxr it).]
This is an xpcshell test, there's no browser involved. Presumably it's just some hook when you exec() a file with this magic attribute.
Assignee: nobody → joshmoz
If I understand correctly, the problem is that the machine is downloading a package containing the test executable "WriteArgument" and "WriteArgument" gets flagged for quarantine. We get a timeout because attempting execution throws up a dialog that has to be clicked through in order to actually start execution.

I don't think there is anything we can do on the build end of things to avoid that since the flag gets set when the file is downloaded, so we should probably disable quarantine on the Mac OS X image we use. That should be possible with this command:

defaults write com.apple.LaunchServices LSQuarantine -bool NO

Why doesn't this happen when we executed downloaded builds of Minefield/Firefox for testing?
Hardware: x86 → All
Assignee: joshmoz → nobody
Component: File Handling → Release Engineering
Product: Core → mozilla.org
QA Contact: file-handling → release
Version: Trunk → other
->RelEng since the current theory is that we need a configuration change to our 10.6 image.
(In reply to comment #21)
> Why doesn't this happen when we executed downloaded builds of Minefield/Firefox
> for testing?

My current theory is that we're calling the firefox-bin executable direct rather than using Launch Services or open on the full package.
working on the puppet patch to apply the config change to 10.6 buildslaves
Status: NEW → ASSIGNED
Whiteboard: [10.6][puppet][buildslaves]
Assignee: nobody → bear
testing to see if the Snow Leopard version of this will also work on Leopard (10.6 is more verbose than 10.5)
Hmm, does 10.5 even have that dialog?
(In reply to comment #27)
> Hmm, does 10.5 even have that dialog?

yes, it has only the most basic type "internet download". 10.6 expands on it and allows you to specify content/mimetype
Comment on attachment 461372 [details] [diff] [review]
puppet change to set OS X preference to disable the "this has been downloaded" dialog

Do we need this on 10.6 test machines, too?
(In reply to comment #29)
> Comment on attachment 461372 [details] [diff] [review]
> puppet change to set OS X preference to disable the "this has been downloaded"
> dialog
> 
> Do we need this on 10.6 test machines, too?

yes, did I not make the change in the general osx part?
(In reply to comment #30)
> (In reply to comment #29)
> > Comment on attachment 461372 [details] [diff] [review] [details]
> > puppet change to set OS X preference to disable the "this has been downloaded"
> > dialog
> > 
> > Do we need this on 10.6 test machines, too?
> 
> yes, did I not make the change in the general osx part?

There's no overlap between build and test machines, unfortunately. You'll need to modify talos_osx.pp.
(In reply to comment #31)
> (In reply to comment #30)
> > (In reply to comment #29)
> > > Comment on attachment 461372 [details] [diff] [review] [details] [details]
> > > puppet change to set OS X preference to disable the "this has been downloaded"
> > > dialog
> > > 
> > > Do we need this on 10.6 test machines, too?
> > 
> > yes, did I not make the change in the general osx part?
> 
> There's no overlap between build and test machines, unfortunately. You'll need
> to modify talos_osx.pp.

drat - good catch.  i'll adjust the patch
added preference plist to talos_osx manifest as well as normal osx manifest
Attachment #461372 - Attachment is obsolete: true
Attachment #461529 - Flags: review?(bhearsum)
Attachment #461372 - Flags: review?(bhearsum)
Comment on attachment 461529 [details] [diff] [review]
puppet change to set OS X preference to disable the "this has been downloaded" dialog

This looks fine. I'd like to have a look at the contents of the files before this lands in production, though.
Attachment #461529 - Flags: review?(bhearsum) → review+
Can we fix this ASAP or at least hide the 64 bit builders on try? Bug 488597 has been spammed multiple times because of this.
(In reply to comment #35)
> Can we fix this ASAP or at least hide the 64 bit builders on try? Bug 488597
> has been spammed multiple times because of this.

this is landing either this evening or first thing in the morning - so yes :)
We have hit this on beta 3 build 3.

What is left to be done in here?
We currently have xpcshell hidden on tinderbox.
I went to land this twice and each time we were in the middle of a release cycle. I can look at landing this today, but i'm hesitant to make a change that touches *all* mac builders during a release
os/osx.pp
os/talos_osx.pp
committed changeset 197:98bd9637752b

landed on staging and production puppet servers.

please let me know if you see any oddness :)
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
http://tests.themasta.com/tinderboxpushlog/?noignore=1 shows this is not actually fixed
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
To be clear, in the URL in the previous comment, you should be looking at the "Xo" and "Xd" builds on any "OS X64" line.
Whiteboard: [10.6][puppet][buildslaves] → [10.6][puppet][buildslaves][orange]
Did these builds get unhidden? Doesn't make sense to repeatedly star a known perma-orange.
Is the perma orange caused by something other than the file dialog prompt I'm trying to fix?
They never were hidden on e10s/tracemonkey/try, it's just that virtually nobody ever stars there. We can hide them there, too, but that increases our risk of picking up yet another 10.6 failure from one of those sources (in my happy imagination, where people working on those trees actually do look at every orange, they just never star any of them).

And nobody knows whether there's something else besides the dialog, or whether the dialog isn't actually fixed, because the only way this failure can be reproduced is by running (downloaded) packaged tests, which nobody but releng does. So now we either need you or Armen to run xpcshell while watching it, to see whether the same dialog or a different dialog or something else or nothing pops up, or we need to go through the hassle of making a machine available to a developer (who isn't going to have any idea how to run packaged tests, or where to get them, and is going to have even more questions about how to do it than you).
I've gotten from armenzg the commands to run, so each time you see an orange let me now - i'll go to that slave and run the test live and see what the issue is.

after confirming that the os fix is ok, then we can chew on what else could be the issue
philor: pretty sure that's a fantasy world, and we should just hide this on e10s/tm.
I'm pretty sure your world, where it really matters whether we hide it in places that have gotten between zero and three starrings when it's been failing every single run for months, is fantasy too, but,

Hidden on tracemonkey, try, and electrolysis.
Summary: test_punycodeURIs.js fails (was timing out) on 10.6 opt xpcshell test → test_punycodeURIs.js fails (was timing out) on 10.6 xpcshell test
I am looking at this now on talos-r3-snow-003 and I saw another dialog besided the one I fixed earlier - this one deals with launching (the other had dealt with downloading.) - working on an update
Depends on: 586567
And hidden on Maple, Birch, and Cedar.
Blocks: 571367
No longer blocks: 571367
Blocks: 438871
Is this still a problem ? I can't find any orange on 
  Rev3 MacOSX Snow Leopard 10.6.2 mozilla-central opt test xpcshell
for the last few days. Lots until 09/27, one on 09/30 and 10/01 then green since.
Thanks to the great vacation mind-wipe effect I am (un)happy to reveal that in fact it was me
  http://hg.mozilla.org/mozilla-central/rev/6eb42326798b
IIRC the test was crashing rather than throwing a dialog up.
This is what you get if you manually run the xpcshell tests in uriloader/exthandler/tests/unit* with test_punycodeURIs.js re-enabled. I didn't see any dialog on the slave, it looks like a test failure to me.
Attachment #484523 - Attachment mime type: application/octet-stream → text/plain
reminded me that I haven't been able to focus on this and android tests at the same time - so i'm un-assigning myself
Assignee: bear → nobody
I don't know how to read that log for the real error, but there's no /Users/cltbld/talos-slave/mozilla-central_snowleopard-debug_test-xpcshell/build/MinefieldDebug.app/Contents/MacOS/components/libxpcomsample.dylib, and the only libxpcomsample.dylib to be found in .../build/ is a dir in symbols/. Which all suggests we're not packaging a file. Passing back to Standard8 for investigation.
Assignee: nobody → bugzilla
The xpcomsample thing is just the normal noises in here - I think bug 589666, but that's spewed all through all sorts of logs. The meat is the NS_ERROR_FAILURE from nsILocalHandlerApp.launchWithURI, but I sort of think that when bear first killed the second dialog, I traced through launchWithURI, and it looked like that would only be coming through from the OS.
I've just looked at this on a builder that we're currently setting up for Thunderbird.

From what I can tell WriteArgument is being called correctly, and the executable runs, however, ::LSOpenFromURLSpec is returning '0' on 10.5 and '-10811' on 10.6.

The Mac documentation says -10811 is "kLSNotAnApplicationErr": "The item to be registered is not an application.".

I don't understand this. Either this is something specific to 10.6, or it could be the fact that it 10.6 doesn't like the binary for some reason - assuming it was the binary built on 10.5 and not 10.6. The stranger thing is that the executable does seem to run on both 10.5 and 10.6.

So I don't really understand what is happening and I think someone closer to Mac and with a 10.6 machine and running packaged tests may be better to look at this - as well as the fact that I don't really have time to dig into it further at the moment.
Assignee: bugzilla → nobody
Thanks for looking at it Mark. Any ideas on that comment Josh ?
(In reply to comment #71)
> Thanks for looking at it Mark. Any ideas on that comment Josh ?

Josh: ping? ^^
Priority: -- → P3
This test worked on both 10.5 and 10.6, opt and debug, on a try push based on
 http://hg.mozilla.org/mozilla-central/rev/9ba5e353af1b   (Nov 22)
It is now failing 3 of 4 on
 http://hg.mozilla.org/mozilla-central/rev/20c11e13d929
(bug 599475 comment #13 thru to 17). So it's still disabled. Failure details:

10.6 debug - http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1292318683.1292319660.24706.gz&fulltext=1
TEST-INFO | /Users/cltbld/talos-slave/mozilla-central_snowleopard-debug_test-xpcshell/build/xpcshell/tests/uriloader/exthandler/tests/unit/test_punycodeURIs.js | running test ...
TEST-UNEXPECTED-FAIL | /Users/cltbld/talos-slave/mozilla-central_snowleopard-debug_test-xpcshell/build/xpcshell/tests/uriloader/exthandler/tests/unit/test_punycodeURIs.js | test failed (with xpcshell return code: 0), see following log:
>>>>>>>
### XPCOM_MEM_LEAK_LOG defined -- logging leaks to /var/folders/H5/H5TD8hgwEqKq9hgKlayjWU+++TM/-Tmp-/tmp4pjzd4/runxpcshelltests_leaks.log
nsNativeModuleLoader::LoadModule("/Users/cltbld/talos-slave/mozilla-central_snowleopard-debug_test-xpcshell/build/MinefieldDebug.app/Contents/MacOS/components/libxpcomsample.dylib") - load FAILED, rv: 80004005, error:
	<unknown; can't get error from NSPR>
*** HandlerServiceTest: getFile: requesting UMimTyp
*** HandlerServiceTest: getFile: requesting CurProcD
*** HandlerServiceTest: the following NS_ERROR_FAILURE exception in nsIDirectoryServiceProvider::getFile is expected, as we don't provide the 'CurProcD' file
TEST-INFO | (xpcshell/head.js) | test 1 pending
TEST-PASS | /Users/cltbld/talos-slave/mozilla-central_snowleopard-debug_test-xpcshell/build/xpcshell/tests/uriloader/exthandler/tests/unit/test_punycodeURIs.js | [run_test : 144] http://xn--wgv71a309e.jp/ == http://xn--wgv71a309e.jp/
TEST-UNEXPECTED-FAIL | (xpcshell/head.js) | [Exception... "Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsILocalHandlerApp.launchWithURI]"  nsresult: "0x80004005 (NS_ERROR_FAILURE)"  location: "JS frame :: /Users/cltbld/talos-slave/mozilla-central_snowleopard-debug_test-xpcshell/build/xpcshell/tests/uriloader/exthandler/tests/unit/test_punycodeURIs.js :: run_test :: line 147"  data: no]
WARNING: nsExceptionService ignoring thread destruction after shutdown: file /builds/slave/mozilla-central-macosx64-debug/build/xpcom/base/nsExceptionService.cpp, line 197
WARNING: Failed to get XPConnect?!: 'rts', file /builds/slave/mozilla-central-macosx64-debug/build/xpcom/base/nsCycleCollector.cpp, line 3415
WARNING: Failed to get XPConnect?!: 'rts', file /builds/slave/mozilla-central-macosx64-debug/build/xpcom/base/nsCycleCollector.cpp, line 3415
WARNING: Failed to get XPConnect?!: 'rts', file /builds/slave/mozilla-central-macosx64-debug/build/xpcom/base/nsCycleCollector.cpp, line 3415
WARNING: OOPDeinit() without successful OOPInit(): file /builds/slave/mozilla-central-macosx64-debug/build/toolkit/crashreporter/nsExceptionHandler.cpp, line 1742
nsStringStats
 => mAllocCount:           1507
 => mReallocCount:           99
 => mFreeCount:            1507
 => mShareCount:           6812
 => mAdoptCount:             58
 => mAdoptFreeCount:         58
<<<<<<<

10.6 opt - http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1292319921.1292320728.31002.gz&fulltext=1 - abbreviated version of 10.6 debug

10.5 debug - http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1292323661.1292324905.20847.gz - green
TEST-INFO | /Users/cltbld/talos-slave/mozilla-central_leopard-o-debug_test-xpcshell/build/xpcshell/tests/uriloader/exthandler/tests/unit/test_punycodeURIs.js | running test ...
TEST-PASS | /Users/cltbld/talos-slave/mozilla-central_leopard-o-debug_test-xpcshell/build/xpcshell/tests/uriloader/exthandler/tests/unit/test_punycodeURIs.js | test passed

10.5 opt - http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1292319925.1292320567.30260.gz&fulltext=1
TEST-INFO | /Users/cltbld/talos-slave/mozilla-central_leopard_test-xpcshell/build/xpcshell/tests/uriloader/exthandler/tests/unit/test_punycodeURIs.js | running test ...
TEST-UNEXPECTED-FAIL | /Users/cltbld/talos-slave/mozilla-central_leopard_test-xpcshell/build/xpcshell/tests/uriloader/exthandler/tests/unit/test_punycodeURIs.js | test failed (with xpcshell return code: 0), see following log:
>>>>>>>
*** HandlerServiceTest: getFile: requesting UMimTyp
*** HandlerServiceTest: getFile: requesting CurProcD
*** HandlerServiceTest: the following NS_ERROR_FAILURE exception in nsIDirectoryServiceProvider::getFile is expected, as we don't provide the 'CurProcD' file
TEST-INFO | (xpcshell/head.js) | test 1 pending
TEST-PASS | /Users/cltbld/talos-slave/mozilla-central_leopard_test-xpcshell/build/xpcshell/tests/uriloader/exthandler/tests/unit/test_punycodeURIs.js | [run_test : 144] http://xn--wgv71a309e.jp/ == http://xn--wgv71a309e.jp/
TEST-INFO | (xpcshell/head.js) | test 2 pending
TEST-INFO | (xpcshell/head.js) | test 2 finished
TEST-INFO | (xpcshell/head.js) | running event loop
TEST-UNEXPECTED-FAIL | /Users/cltbld/talos-slave/mozilla-central_leopard_test-xpcshell/build/xpcshell/head.js | exception thrown from do_timeout callback: [Exception... "Component returned failure code: 0x80520012 (NS_ERROR_FILE_NOT_FOUND) [nsIFileInputStream.init]"  nsresult: "0x80520012 (NS_ERROR_FILE_NOT_FOUND)"  location: "JS frame :: /Users/cltbld/talos-slave/mozilla-central_leopard_test-xpcshell/build/xpcshell/tests/uriloader/exthandler/tests/unit/test_punycodeURIs.js :: checkFile :: line 71"  data: no] - See following stack:
JS frame :: /Users/cltbld/talos-slave/mozilla-central_leopard_test-xpcshell/build/xpcshell/head.js :: do_throw :: line 439
JS frame :: /Users/cltbld/talos-slave/mozilla-central_leopard_test-xpcshell/build/xpcshell/head.js :: <TOP_LEVEL> :: line 155
TEST-INFO | (xpcshell/head.js) | exiting test
<<<<<<<
This has a "[Exception... "Component returned failure code: 0x80520012 (NS_ERROR_FILE_NOT_FOUND) [nsIFileInputStream.init]" not seen on 10.6.
Situation's unchanged, other than that we've added in 10.7 which fails like 10.6.
Assignee: nobody → armenzg
Priority: P3 → P4
Assignee: armenzg → nobody
Is this still an issue? AFAICT bug 706751 is the only issue on the 10.7 builders, and the 10.6 ones are green.
(In reply to Mark Banner (:standard8) from comment #76)
> Is this still an issue? AFAICT bug 706751 is the only issue on the 10.7
> builders, and the 10.6 ones are green.

Probably, because bug 599475 disabled it, so I suspect this is still valid, although we cold probably close it and turn it into a core bug to re-enable that test.
Whiteboard: [10.6][puppet][buildslaves][orange] → [10.6][puppet][buildslaves][orange][test disabled by bug 599475]
Whiteboard: [10.6][puppet][buildslaves][orange][test disabled by bug 599475] → [10.6][puppet][buildslaves][test disabled by bug 599475]
Product: mozilla.org → Release Engineering
I don't want to lose the history here, so based on comment #77, I'm going to move this over to the Testing product for re-enabling when ready rather than closing.
Status: REOPENED → NEW
Component: Other → XPCShell Harness
Product: Release Engineering → Testing
Version: other → unspecified
Summary: test_punycodeURIs.js fails (was timing out) on 10.6 xpcshell test → test_punycodeURIs.js fails (was timing out) on Mac xpcshell test
Well, let's (hollow laugh) give this bug to the owners of the test.
Component: XPCShell Harness → File Handling
Product: Testing → Firefox
Version: unspecified → Trunk
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.