Closed Bug 647126 Opened 14 years ago Closed 14 years ago

jetpack-win7 build fails tests due to test timeout

Tracking

(Not tracked)

Status:

RESOLVED FIXED

Milestone:

1.0

People

(Reporter: myk, Assigned: myk)

References

Details

Attachments

(9 files, 3 obsolete files)

add ---verbose to help debugging 14 years ago John O'Duinn [:joduinn] (please use "needinfo?" flag) 549 bytes, patch	mozilla : review+	Details \| Diff \| Splinter Review
add --verbose to help debugging (v2) 14 years ago John O'Duinn [:joduinn] (please use "needinfo?" flag) 549 bytes, patch	mozilla : review+	Details \| Diff \| Splinter Review
flush stderr after 'testcfx', avoid interleaving results 14 years ago Brian Warner [:warner :bwarner] 522 bytes, patch	myk : review+	Details \| Diff \| Splinter Review
send all test output to stderr, not stdout 14 years ago Brian Warner [:warner :bwarner] 4.42 KB, patch	myk : review+	Details \| Diff \| Splinter Review
emit elapsed time once every 5 seconds 14 years ago Brian Warner [:warner :bwarner] 953 bytes, patch	myk : review+	Details \| Diff \| Splinter Review
increase per-test timeout from 10s to 30s 14 years ago Brian Warner [:warner :bwarner] 522 bytes, patch		Details \| Diff \| Splinter Review
raise timeout for test-context-menu to 30s 14 years ago Brian Warner [:warner :bwarner] 704 bytes, patch	myk : review+	Details \| Diff \| Splinter Review
show unexpected resultsfile contents 14 years ago Brian Warner [:warner :bwarner] 761 bytes, patch	myk : review+	Details \| Diff \| Splinter Review
assertion dialog that causes problem on debug builds 14 years ago Myk Melez [:myk] [@mykmelez] 41.71 KB, image/png		Details
don't use a HARNESS_OPTIONS environment variable 14 years ago Myk Melez [:myk] [@mykmelez] 3.48 KB, patch	warner : review+	Details \| Diff \| Splinter Review
don't use a HARNESS_OPTIONS environment variable, take 2 14 years ago Myk Melez [:myk] [@mykmelez] 3.51 KB, patch	warner : review+	Details \| Diff \| Splinter Review
output of cfx testall -v 14 years ago Matěj Cepl 251.70 KB, text/plain		Details

Myk Melez [:myk] [@mykmelez]

Assignee

Description

•

14 years ago

After running some tests, the jetpack-win7 build fails tests because of a test timeout of unknown provenance. http://tinderbox.mozilla.org/showlog.cgi?log=Jetpack/1300863114.1300863502.13823.gz&fulltext=1 Traceback (most recent call last): File "bin/cfx", line 29, in <module> cuddlefish.run() File "c:\talos-slave\jetpack-win7\jetpack-poller\addon-sdk\python-lib\cuddlefish\__init__.py", line 440, in run test_all(env_root, defaults=options.__dict__) File "c:\talos-slave\jetpack-win7\jetpack-poller\addon-sdk\python-lib\cuddlefish\__init__.py", line 280, in test_all test_all_packages(env_root, defaults) File "c:\talos-slave\jetpack-win7\jetpack-poller\addon-sdk\python-lib\cuddlefish\__init__.py", line 330, in test_all_packages defaults=defaults) File "c:\talos-slave\jetpack-win7\jetpack-poller\addon-sdk\python-lib\cuddlefish\__init__.py", line 695, in run addons=options.addons) File "c:\talos-slave\jetpack-win7\jetpack-poller\addon-sdk\python-lib\cuddlefish\runner.py", line 288, in run_app timeout) Exception: Wait timeout exceeded (300s) program finished with exit code 1 It isn't clear why this is happening. It seems likely to be a bug in the SDK rather than the build scripts. Diagnosing it may require releng to give a Jetpack developer access to a Windows 7 buildslave. However, the first step to diagnosing it is to enable verbose logging by adding the --verbose flag to the "cfx testall" command on line 78 of run-jetpack.sh: http://mxr.mozilla.org/build/source/tools/buildfarm/utils/run_jetpack.sh#78 Thus I am filing it in the Release Engineering component for now.

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Comment 1

•

14 years ago

Attached patch add ---verbose to help debugging (obsolete) — Details — Splinter Review

from irc with myk and ctalbert: using --verbose will help verify if the problem is this test hanging, or the next test failing to start.

Attachment #523654 - Flags: review?(aki)

Aki Sasaki (not active)

Comment 2

•

14 years ago

Comment on attachment 523654 [details] [diff] [review] add ---verbose to help debugging http://hg.mozilla.org/build/tools/rev/dbfbaa9e4554

Attachment #523654 - Flags: review?(aki)

Attachment #523654 - Flags: review+

Attachment #523654 - Flags: checked-in+

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Updated

•

14 years ago

Depends on: 647301

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Updated

•

14 years ago

Blocks: 647302

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Comment 3

•

14 years ago

Attached patch add --verbose to help debugging (v2) — Details — Splinter Review

further irc from ctalbert, myk agrees ctalbert's suggestion is better.

Attachment #523654 - Attachment is obsolete: true

Attachment #523659 - Flags: review?(aki)

Attachment #523659 - Flags: checked-in?

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Comment 4

•

14 years ago

(In reply to comment #4) > After running some tests, the jetpack-win7 build fails tests because of a test > timeout of unknown provenance. > It isn't clear why this is happening. It seems likely to be a bug in the SDK > rather than the build scripts. Diagnosing it may require releng to give a > Jetpack developer access to a Windows 7 buildslave. However, the first step to > diagnosing it is to enable verbose logging by adding the --verbose flag to the > "cfx testall" command on line 78 of run-jetpack.sh: > http://mxr.mozilla.org/build/source/tools/buildfarm/utils/run_jetpack.sh#78 > Thus I am filing it in the Release Engineering component for now. Patches live in production to generate verbose info. bug#647301 is tracking getting myk access to machines. Nothing left to do here, so pushing this bug to labs to track debugging the failing test.

Component: Release Engineering → Jetpack SDK

Flags: checked-in?

Flags: checked-in+

Product: mozilla.org → Mozilla Labs

QA Contact: release → jetpack-sdk

Version: other → unspecified

Aki Sasaki (not active)

Comment 5

•

14 years ago

Comment on attachment 523659 [details] [diff] [review] add --verbose to help debugging (v2) http://hg.mozilla.org/build/tools/rev/111bf8895346

Attachment #523659 - Flags: review?(aki) → review+

Myk Melez [:myk] [@mykmelez]

Assignee

Updated

•

14 years ago

Assignee: nobody → myk

Status: NEW → ASSIGNED

Component: Jetpack SDK → General

Priority: -- → P1

Product: Mozilla Labs → Add-on SDK

QA Contact: jetpack-sdk → general

Target Milestone: --- → 1.0b5

Dustin J. Mitchell [:dustin] (he/him)

Updated

•

14 years ago

No longer blocks: 647302

Myk Melez [:myk] [@mykmelez]

Assignee

Comment 6

•

14 years ago

Over in bug 647301, Aki gave me access to a Windows 7 machine. I tried running run_jetpack.sh manually there, and it successfully downloaded the SDK, downloaded the latest mozilla-central build of Firefox, and ran the SDK tests. There were a few test failures, but they didn't prevent the test run from completing, and there was no test timeout. So either the machine on which I am testing does not have the timeout problem, or the problem is specific to the way the build process runs run_jetpack.sh. Unfortunately, I'm not familiar with the build process. I note from the logs that it seems to start with the directory c:\talos-slave\jetpack-win7\, but that directory doesn't exist on the machine I'm using to test (c:\talos-slave exists and contains a bunch of subdirectories, but it doesn't contain jetpack-win7). Aki: can you point me towards info on kicking off a build process on the machine? Seeing if that works seems like the right next step.

Brian Warner [:warner :bwarner]

Comment 7

•

14 years ago

Attached patch flush stderr after 'testcfx', avoid interleaving results — Details — Splinter Review

This ought to reduce some of the confused interleaving of test results. I'm pretty sure that the python-side tests ("testcfx") finishes normally, then the firefox/mozrunner-side tests start ("testall" which runs "test" in each package), one of which hangs and triggers the timeout. Here's what I think is happening: * test_all() is run, which first prints "Testing cfx..." to sys.stdout . This gets buffered and doesn't appear for a while. * python tests (using stdlib "unittest" package) run normally, write their results to sys.stderr, don't flush anything. They write enough data to fill the buffer, causing it to flush, so the very first thing we see in the logs is the first line of the tests: "Doctest: cuddlefish.packaging.validate_resource_hostname ... ok" * when they complete (successfully), about 2kB of text is still sitting in the sys.stderr buffer, starting in the middle of a test line (e.g. "_package_doc (cuddlefish.tests.test_webdocs.WebDocTests) ... ok"), continuing through the remaining tests, and finishing with "---\nRan 104 tests in 8.712s\n\nOK\n". * now mozrunner tests begin, writing results to sys.stdout (which still has "Testing cfx..." buffered). The first thing written is "Testing annotator...", followed by some mozrunner "Using binary at.." lines. Then, as the test runs (on windows), cuddlefish.runner copies data out of a disk file and writes it to sys.stdout (including lots of noise from the JS console.info() calls). Eventually enough data is written to fill the buffer, causing it to flush, so we see "Testing cfx..." followed by everything else show up in the logs. * eventually we hit an item which hangs (probably in test-context-menu, as that's the last "info: executing ..." line in the logs). This causes cuddlefish/runner.py to timeout, signalled by raising an Exception(). Python deals with uncaught exceptions by writing a stringified representation to sys.stderr then terminating the program with rc=1. All that data to sys.stderr fills the buffer (which still has the last 2kBish of text from testcfx, finish with "OK\n"), causing it to flush, so in the logs we see the abrupt transition from console.info() messages to python unittest messages. After the "OK" we see the exception string. Then the process exits, which flushes all buffers as it closes the filehandles. So we see the end of stderr ("Exception: Wait timeout exceeded (300s)"), and maybe any lingering stdout (but I don't think there is any). * finally, some parent process (run_jetpack.sh? the buildslave itself? some other buildstep?) writes "program finished with exit code 1" and "elapsedTime=384.307000". So, this one-line patch adds a flush after the python-side tests (testcfx) have finished. As this is the barrier between code which uses stderr and code which uses stdout, this ought to fix the most obvious interleaving problem. (I see some other weirdness in the logs, specifically a console.log() line that says "em should be present in overflow submenu", which appears to have dropped some characters, but we can investigate those later). If this works, I'll expect to see the log finish with the console.info() line from the hanging test-context-menu test, followed by the timeout traceback, followed by the end of file. That should make it easier to spot which test is hanging.

Attachment #525751 - Flags: review?(myk)

Myk Melez [:myk] [@mykmelez]

Assignee

Comment 8

•

14 years ago

Comment on attachment 525751 [details] [diff] [review] flush stderr after 'testcfx', avoid interleaving results That sounds like a reasonable explanation; let's give it a shot!

Attachment #525751 - Flags: review?(myk) → review+

Myk Melez [:myk] [@mykmelez]

Assignee

Comment 9

•

14 years ago

Brian landed attachment 525751 [details] [diff] [review]: https://github.com/mozilla/addon-sdk/commit/49363bbb6335f35c7dfa1525700e58c0eaa45c12 But it doesn't seem to have helped matters (much?): https://github.com/mozilla/addon-sdk/commit/49363bbb6335f35c7dfa1525700e58c0eaa45c12 Digging further, the package tests appear to be getting run twice: Testing all available packages: addon-kit, api-utils, test-harness, development-mode. Using binary at 'C:\talos-slave\jetpack-w764/jetpack-poller/firefox/firefox.exe'. Using profile at 'c:\users\cltbld\appdata\local\temp\tmpev-gxb.mozrunner'. Running tests on Firefox 6.0a1/Gecko 6.0a1 ({ec8030f7-c20a-464f-9b0e-13a3a9e97384}) under WINNT/x86_64-msvc. info: executing 'test-packaging.testPackaging' ... Running tests on Firefox 6.0a1/Gecko 6.0a1 ({ec8030f7-c20a-464f-9b0e-13a3a9e97384}) under WINNT/x86_64-msvc. info: executing 'test-packaging.testPackaging' ... Also: myk@myk:~$ grep 'info: pass' log.txt | wc 6286 52146 314678 (That's about twice as many "info: pass" lines as would be expected.) Interestingly, the "testing all available packages", "using binary", and "running tests" lines are not reprinted. So the duplication appears to happen within the package test code rather than outside of it. Brian: if I recall correctly, at one point the test runner was configured to run package tests twice, once with the experimental e10s support enabled and once with it disabled. But I thought we changed that. Is it possible that the change didn't take on Windows 7 for some reason?

Myk Melez [:myk] [@mykmelez]

Assignee

Comment 10

•

14 years ago

(In reply to comment #9) > Interestingly, the "testing all available packages", "using binary", and > "running tests" lines are not reprinted. So the duplication appears to happen > within the package test code rather than outside of it. Erm, the "running tests" line is actually reprinted. I meant to say that the "using profile" line is not reprinted. And if the runner is kicking off two package test runs, both using the same profile (since the duplication starts after the "using profile" line), and MOZ_NO_REMOTE is set, then that would explain why at least one of those runs times out, since the second Firefox process will never fully start up in that case, it'll just display a "Close Firefox" dialog complaining "Firefox is already running, but is not responding. To open a new window, you must first close the existing Firefox process, or restart your system."

Myk Melez [:myk] [@mykmelez]

Assignee

Comment 11

•

14 years ago

(In reply to comment #10) > And if the runner is kicking off two package test runs, both using the same > profile (since the duplication starts after the "using profile" line), and > MOZ_NO_REMOTE is set, then that would explain why at least one of those runs > times out, since the second Firefox process will never fully start up in that > case, it'll just display a "Close Firefox" dialog complaining "Firefox is > already running, but is not responding. To open a new window, you must first > close the existing Firefox process, or restart your system." On the other hand, it's the test runner addon that writes the "running tests" line, while mozrunner is responsible for starting Firefox, so it might be the case that Firefox is started only once, but then the runner runs tests twice.

Myk Melez [:myk] [@mykmelez]

Assignee

Comment 12

•

14 years ago

(In reply to comment #9) > Brian landed attachment 525751 [details] [diff] [review]: > > > https://github.com/mozilla/addon-sdk/commit/49363bbb6335f35c7dfa1525700e58c0eaa45c12 > > But it doesn't seem to have helped matters (much?): > > > https://github.com/mozilla/addon-sdk/commit/49363bbb6335f35c7dfa1525700e58c0eaa45c12 Erm, this second URL was meant to be: http://tinderbox.mozilla.org/showlog.cgi?log=Jetpack/1302740754.1302741182.23011.gz&fulltext=1

Brian Warner [:warner :bwarner]

Comment 13

•

14 years ago

What I've learned so far, looking at the 'testall' sequence on my own box, is that all the little console.info() messages from the JS tests go to stderr, not stdout (at least on OS-X). However the progress messages ("Testing cfx...", "Testing annotator..", "Using binary at..", "Testing all available packages: ..", "Program terminated successfully", and "Some tests were unsuccessful") go to stdout. So there's lots of potential for confusion and output sequencing problems. I'd like to get everything to write to a single place. I spoke to Atul briefly about it, and it sounds like getting the JS code to write to stdout instead of stderr is difficult, but that it might be straightforward to use the windows path of writing everything to a file, and then having cfx read the file. I'll look into that.

Brian Warner [:warner :bwarner]

Comment 14

•

14 years ago

Attached patch send all test output to stderr, not stdout — Details — Splinter Review

Here's another cleanup patch, to send all test output (including the banner lines like "Testing all packages..") to stderr, not stdout. Basically the JS-side console.info() writes to stderr, and Atul says it's hard to change that, so instead this changes the cfx side to match the JS (and unittest.TextTestRunner) side. It also adds a few more banners, to make it easier to spot the border between testing the examples and testing the normal (addon-kit/api-utils) packages. This might make it a bit easier to investigate the apparent double-test problem.

Attachment #526104 - Flags: review?(myk)

Myk Melez [:myk] [@mykmelez]

Assignee

Comment 15

•

14 years ago

Comment on attachment 526104 [details] [diff] [review] send all test output to stderr, not stdout I sure wish we could do it the other way around (make mozrunner output to stdout). This reminds me that we do that funky logfile thing on Windows. I wonder if that could be the source of this log wonkiness.

Attachment #526104 - Flags: review?(myk) → review+

Brian Warner [:warner :bwarner]

Comment 16

•

14 years ago

Attached patch emit elapsed time once every 5 seconds — Details — Splinter Review

this patch might help us see whether something is hanging, or if the buildslave is merely slow and the timeout value is too low

Attachment #526120 - Flags: review?(myk)

Myk Melez [:myk] [@mykmelez]

Assignee

Comment 17

•

14 years ago

Comment on attachment 526120 [details] [diff] [review] emit elapsed time once every 5 seconds >+ next_chime += chime_interval Nit: if `next_chime = elapsed + chime_interval`, it would ensure two chimes never occur less than chime_interval apart.

Attachment #526120 - Flags: review?(myk) → review+

Brian Warner [:warner :bwarner]

Comment 18

•

14 years ago

yeah, I'd prefer to write everything to stdout too. From Atul it sounded like something in the mozilla logging machinery. The logfile thing on windows is a likely suspect: from what I can tell (in harness.js: https://github.com/mozilla/addon-sdk/blob/355d84e71665e748ce880d915af8ff954e275691/python-lib/cuddlefish/app-extension/components/harness.js#L548) the logfile is *in addition to* the normal dump() function. It's conceivable that the logFile code was originally written for an environment in which, e.g., the browser launches in a way that discards its normal stderr, but the buildslave is running it in a way which preserves stderr, and then the use of the logFile would result in duplicate messages (one from the real stderr, the other copied from the logfile). A simple way to check this would be to change runner.py to prefix each chunk of logfile data with "LOGFILE:", then see if only one of the two copies has the prefix.

Drew Willcoxon :adw

Comment 19

•

14 years ago

(In reply to comment #18) > yeah, I'd prefer to write everything to stdout too. From Atul it sounded like > something in the mozilla logging machinery. FWIW dump() in sandboxes hardcodes stderr: http://mxr.mozilla.org/mozilla-central/source/js/src/xpconnect/src/xpccomponents.cpp#3011 I haven't been following along but could you just open a stream to stdout instead of using dump()? (dump() in windows on the other hand hardcodes stdout:) http://mxr.mozilla.org/mozilla-central/source/dom/base/nsGlobalWindow.cpp#4469

Brian Warner [:warner :bwarner]

Comment 20

•

14 years ago

Ok, it looks like the win7 tests are down to one failure, a timeout in test-context-menu . Looking at the test-running code (packages/api-utils/lib/unit-test.js, specifically https://github.com/mozilla/addon-sdk/blob/8dd64c64abf7fc3eda997d35590bb9c5ac1ed10b/packages/api-utils/lib/unit-test.js#L230), it appears that tests which use waitUntilDone() (including all of test-context-menu) get 10 seconds (DEFAULT_PAUSE_TIMEOUT) to complete. If they haven't finished in that time, waitUntilDone() marks them as failed and starts the next test. waitUntilDone() doesn't have any way to stop the work that's already been queued up: it's not so much halting the test as it is hoping that the test has hung and can be safely ignored. In our case, I think test-context-menu.testSortingOverflow takes more than 10 seconds, but it's not hung, it's just slow. So the next test (testSortingMultipleModules) begins, and overlaps with testSortingOverflow. I can't tell that this is actually causing problems, but I can easily imagine other tests where it would. So I propose to increase DEFAULT_PAUSE_TIMEOUT to 30 seconds, and see if that helps. The deeper question is why win7 is taking about seven times as long to run the test suite as linux and OS-X.

Brian Warner [:warner :bwarner]

Comment 21

•

14 years ago

Attached patch increase per-test timeout from 10s to 30s (obsolete) — Details — Splinter Review

It looks like test-context-menu tests are taking a long (but finite) time to run, exceeding the 10-second per-test timeout. By raising it to 30s, we might avoid this problem.

Attachment #526412 - Flags: review?(myk)

Brian Warner [:warner :bwarner]

Comment 22

•

14 years ago

Attached patch raise timeout for test-context-menu to 30s — Details — Splinter Review

new version of the patch only changes the timeout for test-context-menu, leaving it at 10s for all other tests.

Attachment #526412 - Attachment is obsolete: true

Attachment #526412 - Flags: review?(myk)

Attachment #526413 - Flags: review?(myk)

Myk Melez [:myk] [@mykmelez]

Assignee

Updated

•

14 years ago

Attachment #526413 - Flags: review?(myk) → review+

Matěj Cepl

Comment 23

•

14 years ago

Could you make logging optional (and non-default). It took me some time and investigation before I found out why my jetpack started to produce rather confusing messages (elapsed time: 5 seconds) info: Message: cmd = CreateButtons, data = [object Object] info: onMessage - incoming : msg.cmd = CreateButtons (elapsed time: 10 seconds) (elapsed time: 15 seconds) (elapsed time: 20 seconds) (elapsed time: 25 seconds) (elapsed time: 30 seconds) (elapsed time: 35 seconds) (elapsed time: 40 seconds) (elapsed time: 45 seconds) (elapsed time: 50 seconds) (elapsed time: 55 seconds) (elapsed time: 60 seconds) (elapsed time: 65 seconds) (elapsed time: 70 seconds) (elapsed time: 75 seconds) (elapsed time: 80 seconds) (elapsed time: 85 seconds) (elapsed time: 90 seconds) (elapsed time: 95 seconds) (elapsed time: 100 seconds) (elapsed time: 105 seconds) (elapsed time: 110 seconds) (elapsed time: 115 seconds) for normal cfx run. I guess if it was printed only for cfx test -v, I wouldn't have any objections.

Myk Melez [:myk] [@mykmelez]

Assignee

Comment 24

•

14 years ago

Matej: that logging of elapsed time is temporary, while we work out the kinks in our test automation. We'll remove it once we're done.

Brian Warner [:warner :bwarner]

Comment 25

•

14 years ago

Attached patch show unexpected resultsfile contents — Details — Splinter Review

the fedora64 buildslave stopped while finishing up a test, possibly because runner.py never saw the "OK" or "FAIL" results it was looking for. This patch prints out the contents when they are something unexpected.

Attachment #526851 - Flags: review?(myk)

Myk Melez [:myk] [@mykmelez]

Assignee

Comment 26

•

14 years ago

Comment on attachment 526851 [details] [diff] [review] show unexpected resultsfile contents >+ sys.stderr.write("Hrm, resultfile (%s) contained something weird (%d bytes)\n" % (resultfile, len(output))) >+ sys.stderr.write("'"+output+"'\n") Nit: I'd add a colon to the end of the first line to make it more obvious that the weird thing will be output next. r=myk!

Attachment #526851 - Flags: review?(myk) → review+

Jonathan Kew [:jfkthame]

Comment 27

•

14 years ago

Just got timeouts like this on both XP and Win7, on a push that doesn't seem like it should have anything to do with JP: http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1303835594.1303836477.11203.gz http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1303835633.1303836741.12365.gz

Myk Melez [:myk] [@mykmelez]

Assignee

Comment 28

•

14 years ago

(In reply to comment #27) > Just got timeouts like this on both XP and Win7, on a push that doesn't seem > like it should have anything to do with JP: > > http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1303835594.1303836477.11203.gz > http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1303835633.1303836741.12365.gz Those are debug builds, so I suspect bug 651665, although the signature is slightly different. We should land the platform fix for that bug and then see if this still happens.

Phil Ringnalda (:philor)

Comment 29

•

14 years ago

Those were the first run of debug on Windows, they were permared, now they're hidden.

Myk Melez [:myk] [@mykmelez]

Assignee

Comment 30

•

14 years ago

Building debug now to diagnose this...

Myk Melez [:myk] [@mykmelez]

Assignee

Comment 31

•

14 years ago

Attached image assertion dialog that causes problem on debug builds — Details

This screenshot shows that the problem on Windows debug builds is an assertion dialog that shows up and stops the proceedings. As I recall, there's a way to turn off these dialogs. Digging...

Myk Melez [:myk] [@mykmelez]

Assignee

Comment 32

•

14 years ago

Ah, adw points me to <https://developer.mozilla.org/en/XPCOM_DEBUG_BREAK>. Now to figure out where to plug it into the test harness...

Myk Melez [:myk] [@mykmelez]

Assignee

Comment 33

•

14 years ago

Brian pointed out where to plug it into the test harness (runner.py), but it doesn't seem to work. Maybe it doesn't apply to getenv.c, or perhaps, since getenv.c appears to be for getting environment variables, the variable hasn't been read yet.

Myk Melez [:myk] [@mykmelez]

Assignee

Comment 34

•

14 years ago

Attached patch don't use a HARNESS_OPTIONS environment variable (obsolete) — Details — Splinter Review

The assertion on Firefox Windows debug builds appears to be caused by an overlarge HARNESS_OPTIONS. When you install an SDK-based addon, harness options are read from a file, but when you do `cfx run` or `cfx test` they come in through an environment variable. But they can take up 30-50K, too large for an environment variable on Windows, apparently. Here's a patch that unconditionally writes them to a file (and then makes the harness read them therefrom).

Attachment #528750 - Flags: review?(warner-bugzilla)

Brian Warner [:warner :bwarner]

Comment 35

•

14 years ago

Comment on attachment 528750 [details] [diff] [review] don't use a HARNESS_OPTIONS environment variable r+ by me. So crazy it just might work. Some nits: - the remaining harness.js error message ("HARNESS_OPTIONS env var must exist") should probably mention harness-options.json file, instead - in runner.py: use atexit.register() to maybe-remove the options file even if the runner dies for some reason

Attachment #528750 - Flags: review?(warner-bugzilla) → review+

Myk Melez [:myk] [@mykmelez]

Assignee

Comment 36

•

14 years ago

Attached patch don't use a HARNESS_OPTIONS environment variable, take 2 — Details — Splinter Review

This patch removes the file atexit and cleans up references to HARNESS_OPTIONS in a comment and an error message.

Attachment #528750 - Attachment is obsolete: true

Attachment #528755 - Flags: review?(warner-bugzilla)

Brian Warner [:warner :bwarner]

Comment 37

•

14 years ago

Comment on attachment 528755 [details] [diff] [review] don't use a HARNESS_OPTIONS environment variable, take 2 look ok. In the long run (say, before 1.0), I want to find another solution, because this makes it impossible to use a single SDK to run two unit tests (for different addons) at the same time. But it's fine for now. r+.

Attachment #528755 - Flags: review?(warner-bugzilla) → review+

Myk Melez [:myk] [@mykmelez]

Assignee

Comment 38

•

14 years ago

Committed: https://github.com/mozilla/addon-sdk/commit/dee3fed1b709db8c4e191ec3c0c49d7e98720941 I ended up reverting the atexit change because it queued up multiple attempts to delete the file and then complained that it couldn't find it on the second and subsequent attempts when I ran `cfx testall`. So the patch I checked in is a combination of the first and second patches.

Myk Melez [:myk] [@mykmelez]

Assignee

Updated

•

14 years ago

Target Milestone: 1.0b5 → 1.0

Myk Melez [:myk] [@mykmelez]

Assignee

Comment 39

•

14 years ago

I'm not sure why we didn't resolve this before, but in any case the most recent commit resolved the problem in question. There are still some timeouts due to bug 651665, and it's possible that bug obscures other issues, but in that case we should file new bugs on those issues once that bug is fixed.

Status: ASSIGNED → RESOLVED

Closed: 14 years ago

Resolution: --- → FIXED

Matěj Cepl

Updated

•

14 years ago

Status: RESOLVED → REOPENED

OS: Windows 7 → All

Resolution: FIXED → ---

Matěj Cepl

Comment 40

•

14 years ago

Attached file output of cfx testall -v — Details

I am sorry, but this bug (if it is this bug) seems to be still doing just fine with commit e3a7dc6ab6493d078b04bf900950f8e5f6a7747b and Mozilla/5.0 (X11; Linux x86_64; rv:2.0.1) Gecko/20100101 Firefox/4.0.1

Brian Warner [:warner :bwarner]

Comment 41

•

14 years ago

Matej: that particular test failure (Package Not Found in test_generate_static_docs_does_not_smoke) is now part of bug 659478.. not a timeout, but still something we're trying to figure out.

Matěj Cepl

Comment 42

•

14 years ago

(In reply to comment #41) > Matej: that particular test failure (Package Not Found in > test_generate_static_docs_does_not_smoke) is now part of bug 659478.. not a > timeout, but still something we're trying to figure out. Just to note that both this and other bugs are labeled as happening on Windows. This is Linux (Fedora 15) here. Joining the merry party on the other bug.

Myk Melez [:myk] [@mykmelez]

Assignee

Comment 43

•

14 years ago

Reresolving, as mcepl's issue is a different bug.

Status: REOPENED → RESOLVED

Closed: 14 years ago → 14 years ago

OS: All → Windows 7

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.