Closed
Bug 994658
Opened 11 years ago
Closed 11 years ago
Several JSbridge disconnect failures
Categories
(Testing Graveyard :: Mozmill, defect)
Testing Graveyard
Mozmill
Tracking
(Not tracked)
RESOLVED
INCOMPLETE
People
(Reporter: andrei, Unassigned)
References
(Depends on 1 open bug)
Details
Attachments
(12 files, 1 obsolete file)
|
16.24 KB,
text/plain
|
Details | |
|
8.73 KB,
text/plain
|
Details | |
|
197.22 KB,
text/plain
|
Details | |
|
85.81 KB,
text/plain
|
Details | |
|
85.19 KB,
text/plain
|
Details | |
|
483.03 KB,
text/plain
|
Details | |
|
170.75 KB,
text/plain
|
Details | |
|
65.43 KB,
text/plain
|
Details | |
|
73.50 KB,
text/plain
|
Details | |
|
59.13 KB,
text/plain
|
Details | |
|
1.06 KB,
text/javascript
|
Details | |
|
17.90 KB,
patch
|
Details | Diff | Splinter Review |
The category will need to be updated.
We've seen several such disconnects today across Windows and OSX nodes.
All failed during restart tests.
Attached is a sample log.
Comment 1•11 years ago
|
||
All this most likely happens because of bug 974971. It looks like Firefox doesn't restart after test1 but just closes down. With my fix on bug 975068 we would know more.
Can you try to reproduce it?
Comment 2•11 years ago
|
||
I tried to reproduce this on mm-win-81-32-4 with the fix from bug 975068 applied, and with "Gecko profiler 1.12.23:" from bug 974971, hoping to reproduce it, I ran about 30 testruns and it didn't failed once.
| Reporter | ||
Comment 3•11 years ago
|
||
Let's try reproducing this disconnects.
It's affecting our ability to properly run tests in CI in a big way.
Assignee: nobody → andrei.eftimie
Status: NEW → ASSIGNED
| Reporter | ||
Comment 4•11 years ago
|
||
I've ran this all day with a patched mozmill (with bug 975068 attachment 8380150 [details] [diff] [review]) on mm-win-81-32-4 and it hasn't dc even once. I've set up some ENV variables (notably proxy, WORKSPACE).
I've ran both full functional testruns, and directly restart tests via mozmill.
I'll keep running these testruns, hopefully it will dc at some point.
Comment 5•11 years ago
|
||
I would suggest you run it without my patch first. It might be that it makes a difference. Once you have a reproduction pattern it will be useful to have this additional information available.
| Reporter | ||
Comment 6•11 years ago
|
||
I actually ran it on a clean 2.0.6.1 env yesterday (again without success).
But I agree, having it fail _either way_ is better than not having it fail at all. I'll do more runs on a clean env. I'll take another machine and run them in parallel.
| Reporter | ||
Comment 8•11 years ago
|
||
This is raw data computed from failure mails.
I went through all failure emails from the 7th until 15th April (local time).
There are also reported failures from 5 and 6 (they were weekend days) which were handled on the 7th.
There's Date and Time (local, based on received email timestamp), machine, last test (where available) and additional info on some.
Total 117 jsbridge disconnects.
I didn't include multiple ondemand update failures which didn't ran and failed with 'updateStagingPath'. There is a possibility that they are related.
No hard conclusions. I just gathered the data.
Some notes:
- vista has almost all failures on mm-win-vista-32-3, and most of them didn't run any tests at all (it looks a bit different from the other ones)
- osx failed mostly in testAddons_installTheme/test1.js. Also some in testPasswordSavedAndDeleted.js and testAddons_uninstallTheme/test1.js
- for win8/8.1: it seems once a machine is affected, multiple failures occur on that machine. This isn't a hard rule, but it looks to be statistically significant. Not sure what it means...
If anyone has the time to look over the data, please. Maybe you notice something.
Comment 9•11 years ago
|
||
One testrun failed with Jsbridge on Windows 8.1 64 Bits with the latest Aurora installed version of, in restartTests\testAddons_installFromFTP\test1.js.
In the console, beside the traceback and jsbridge error there was also 5 occurrences of this error:
> 05:00:43 *************************
> 05:00:43 A coding exception was thrown and uncaught in a Task.
> 05:00:43
> 05:00:43 Full message: TypeError: worker is null
> 05:00:43 Full stack: post/</<@resource://gre/modules/osfile/osfile_async_front.jsm:355:9
> 05:00:43 TaskImpl_run@resource://gre/modules/Task.jsm:282:1
> 05:00:43 TaskImpl@resource://gre/modules/Task.jsm:247:3
> 05:00:43 createAsyncFunction/asyncFunction@resource://gre/modules/Task.jsm:224:7
> 05:00:43 Task_spawn@resource://gre/modules/Task.jsm:139:5
> 05:00:43 post/<@resource://gre/modules/osfile/osfile_async_front.jsm:339:28
> 05:00:43 Handler.prototype.process@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:707:11
> 05:00:43 this.PromiseWalker.walkerLoop@resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:586:7
> 05:00:43 Spinner.prototype.observe@resource://gre/modules/AsyncShutdown.jsm:446:7
> 05:00:43
> 05:00:43 *************************
Comment 10•11 years ago
|
||
Can we check other testruns for this particular promise exception? Does it always occur in those disconnects? Or even without it?
Comment 11•11 years ago
|
||
We established last week that Andrei will no longer work on this at the moment. Daniel, please check the comment above, thanks!
Assignee: andrei.eftimie → nobody
Comment 12•11 years ago
|
||
I look into the latest jsbridge disconnects and the promise exception was present just in this case. So I think might not be related.
From 19-22 April we've seen about 30 jsbridge disconnects. Most of them (over 20) were on windows.
| Reporter | ||
Comment 13•11 years ago
|
||
I was running some testruns and reproduced this JSBridge locally once. (4th testrun, OSX, in restartTests/testAddons_installTheme/test1.js).
Ran this testmodule in a loop 100 times with the WIP patch from bug 975068 applied to get more info. Did not have another jsbridge dc yet.
But this might be interesting:
https://pastebin.mozilla.org/4944191
Not sure if I read this correctly, but from a failed addon install (promise related?) the stack leads us to AsyncShutdown. This _might_ be related as in our JSBridge disconnects, Firefox should restart, yet it only shuts down without being reopened.
Could a faulty shutdown mess with our restart flags?
| Reporter | ||
Comment 14•11 years ago
|
||
With no diconnect in 200 runs, I've hacked a functional testrun to run the first few tests (until installTheme) and ran this in a loop.
In the first 10 runs I got the following failure:
http://mozmill-crowd.blargon7.com/#/functional/report/db3ef7ec039e4c6b255196b62805f7d0
Followed by a disconnect (though it is reported a bit differently):
> ERROR | Test Failure | {
> "exception": {
> "message": "Notification popup state has been open",
> "lineNumber": 27,
> "name": "TimeoutError",
> "fileName": "resource://mozmill/modules/errors.js"
> }
> }
> TEST-UNEXPECTED-FAIL | restartTests/testAddons_installMultipleExtensions/test1.js | testInstallMultipleExtensions
> TEST-START | restartTests/testAddons_installMultipleExtensions/test1.js | teardownModule
> TEST-END | restartTests/testAddons_installMultipleExtensions/test1.js | finished in 7046ms
> RESULTS | Passed: 7
> RESULTS | Failed: 1
> RESULTS | Skipped: 0
> Report document created at 'http://mozmill-crowd.blargon7.com/db/db3ef7ec039e4c6b255196b62805f7d0'
> *** Removing profile: /var/folders/9l/sn_p3bw914s360j602z20jsc0000gq/T/tmpXpMZA7.workspace/profile
> *** Removing test repository '/var/folders/9l/sn_p3bw914s360j602z20jsc0000gq/T/tmpXpMZA7.workspace/mozmill-tests'
> Traceback (most recent call last):
> File "/Users/andrei.eftimie/work/mozilla/mozmill/lib/python2.7/site-packages/mozmill_automation-2.1_dev-py2.7.egg/mozmill_automation/testrun.py", line 351, in run
> self.run_tests()
> File "/Users/andrei.eftimie/work/mozilla/mozmill/lib/python2.7/site-packages/mozmill_automation-2.1_dev-py2.7.egg/mozmill_automation/testrun.py", line 575, in run_tests
> TestRun.run_tests(self)
> File "/Users/andrei.eftimie/work/mozilla/mozmill/lib/python2.7/site-packages/mozmill_automation-2.1_dev-py2.7.egg/mozmill_automation/testrun.py", line 302, in run_tests
> self._mozmill.run(tests, self.options.restart)
> File "/Users/andrei.eftimie/work/mozilla/mozmill/src/mozmill/mozmill/__init__.py", line 444, in run
> self.stop_runner()
> File "/Users/andrei.eftimie/work/mozilla/mozmill/src/mozmill/mozmill/__init__.py", line 576, in stop_runner
> raise Exception('client process shutdown unsuccessful')
> Exception: client process shutdown unsuccessful
| Reporter | ||
Comment 15•11 years ago
|
||
I've run a Debug build and lo and behold I got a crash (in the same place as the usual JSbridge dc).
I'm missing the dmp file, and its not mentioned in about:crashes
I did have the OSX "This application crashed" window. I'll try to see if that saved a dump somewhere.
| Reporter | ||
Comment 16•11 years ago
|
||
Well here's the crash as logged by OSX.
> Crashed Thread: 31 DOM Worker
>
> Exception Type: EXC_BAD_ACCESS (SIGSEGV)
> Exception Codes: KERN_INVALID_ADDRESS at 0x0000000000000000
| Reporter | ||
Comment 17•11 years ago
|
||
Another crash. Same location. Still no dump, only the OSX crash log.
But this one reports problems in malloc:
> Thread 0:: Dispatch queue: com.apple.main-thread
> 0 libsystem_malloc.dylib 0x00007fff8f3c5287 szone_malloc_should_clear + 4
> 1 libsystem_malloc.dylib 0x00007fff8f3c7868 malloc_zone_malloc + 71
> 2 libsystem_malloc.dylib 0x00007fff8f3c827c malloc + 42
> 3 libmozalloc.dylib 0x00000001000823ae moz_xmalloc + 14
Comment 18•11 years ago
|
||
(In reply to Andrei Eftimie from comment #13)
> Not sure if I read this correctly, but from a failed addon install (promise
> related?) the stack leads us to AsyncShutdown. This _might_ be related as in
> our JSBridge disconnects, Firefox should restart, yet it only shuts down
> without being reopened.
>
> Could a faulty shutdown mess with our restart flags?
Absolutely. An exception during shutdown can certainly cause some code not being executed. So a requested restart may fail and Firefox quits. Interesting here would be the exit code of the Firefox process.
Comment 19•11 years ago
|
||
(In reply to Andrei Eftimie from comment #15)
> Created attachment 8412452 [details]
> crash_2014-04-25.txt
>
> I've run a Debug build and lo and behold I got a crash (in the same place as
> the usual JSbridge dc).
>
> I'm missing the dmp file, and its not mentioned in about:crashes
> I did have the OSX "This application crashed" window. I'll try to see if
> that saved a dump somewhere.
Yeah, crashes on OS X and the Apple reporter. I didn't work on OS X for a long time, so I forgot how to get proper crash reports out of a debug build. Steven, can you please help us here?
Flags: needinfo?(smichaud)
| Reporter | ||
Comment 20•11 years ago
|
||
Reproduced directly with mozmill.
Debug build and --debug option in mozmill.
| Reporter | ||
Comment 21•11 years ago
|
||
I can reproduce it directly via mozmill by running:
> mozmill -m firefox/tests/functional/restartTests/testAddons_installTheme/manifest.ini -b /Applications/FirefoxNightlyDebug.app/ --debug --profile=../profile/p1
I've also run only the first test:
> mozmill -t firefox/tests/functional/restartTests/testAddons_installTheme/test1.js -b /Applications/FirefoxNightlyDebug.app/ --debug --profile=../profile/p1
and I did got a Disconnect Error, but different from the rest.
Attached is the log for this.
In this case I still had the Firefox window open, before it DC i noticed the messages logged in the console slowed down considerably. So the last messages:
> --DOMWINDOW == 18 (0x11c1d5e70) [pid = 8249] [serial = 18] [outer = 0x11e7bee30] [url = about:blank]
> --DOMWINDOW == 17 (0x1142261b0) [pid = 8249] [serial = 8] [outer = 0x11418ed30] [url = about:blank]
> --DOMWINDOW == 16 (0x11c1d3570) [pid = 8249] [serial = 15] [outer = 0x0] [url = about:blank]
> --DOMWINDOW == 15 (0x1137cb5f0) [pid = 8249] [serial = 11] [outer = 0x0] [url = about:blank]
> --DOMWINDOW == 14 (0x1140d8ea0) [pid = 8249] [serial = 12] [outer = 0x0] [url = about:home]
> TEST-UNEXPECTED-FAIL | Disconnect Error: Application unexpectedly closed
Were spread apart by 5-20 sec each. This looked like it slowed down until it stopped...
Comment 22•11 years ago
|
||
Regarding the crash for debug build lets do:
1. Try to repro on 10.8 which still supports gdb
2. Lets only run this single test with --debugger=gdb specified for mozmill
3. When it crashes do a 'bt' in gdb to get the stack
4. Fix the stack by running it through http://mxr.mozilla.org/mozilla-central/source/tools/rb/fix_macosx_stack.py
Comment 23•11 years ago
|
||
(In reply to comment #19)
Apple crash reports can most likely be found in ~/Library/DiagnosticReports/. But you might also want to look in /Library/DiagnosticReports/.
Flags: needinfo?(smichaud)
| Reporter | ||
Comment 24•11 years ago
|
||
(In reply to Henrik Skupin (:whimboo) from comment #22)
> 1. Try to repro on 10.8 which still supports gdb
Seems GDB is not tied to OS releases, but to XCODE (Command Line Tools).
I've tried this on a (relatively new installed) 10.8 machine and didn't have GDB in an easy way.
I've found some avenues of installing GDB, I'll give them a go. (Will try this on my main 10.9 machine)
| Reporter | ||
Comment 25•11 years ago
|
||
Well, this is underwhelming.
Finally managed to get a crash under gdb.
Stack looks useless:
> (gdb) bt
> #0 0x00000001024ef38f in ?? ()
> #1 0x0000000000000000 in ?? ()
Attached is the whole log. I'll try again to see if I get the same result...
| Reporter | ||
Comment 26•11 years ago
|
||
Same "trace" on a second crash:
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x170b of process 1987]
> 0x00000001024ef38f in ?? ()
> (gdb) bt
> #0 0x00000001024ef38f in ?? ()
> #1 0x0000000000000000 in ?? ()
Not sure if I'm doing something wrong
Comment 27•11 years ago
|
||
Check which thread has been crashed. Can you retry with a self-made build? There is still no information for the above trace.
Also I wonder what this thread 0x170b actually is, which is created right after an assertion:
Assertion failure: IsCanceled() (Subclass Cancel() didn't set IsCanceled()!), at /builds/slave/m-cen-osx64-d-0000000000000000/build/dom/workers/WorkerRunnable.cpp:278
[New Thread 0x170b of process 1723]
[..]
Source:
http://mxr.mozilla.org/mozilla-central/source/dom/workers/WorkerRunnable.cpp#278
It looks like a worker issue. Olli, do you have an idea here?
Flags: needinfo?(bugs)
Updated•11 years ago
|
Flags: needinfo?(bugs)
| Reporter | ||
Comment 29•11 years ago
|
||
Sorry for spamming with logs, but hopefully we'll get to the bottom of this issue.
I haven't had a crash with a locally build (debug, with symbols) FF. But I did had 2 different outcomes. GDB will not run a second test, but it will honor the restart.
Attached is the log where we RECONNECT to Firefox.
| Reporter | ||
Comment 30•11 years ago
|
||
And here is the log where we DON'T RECONNECT.
I'm diffing them ATM to see if anything stands out. It could very well be that this case is the problematic behaviour. With all debug options activated maybe we'll find something.
Comment 31•11 years ago
|
||
That looks suspicious:
[59393] WARNING: '!fd.IsInitialized()', file /Users/andrei.eftimie/work/mozilla/gecko-dev/netwerk/base/src/nsSocketTransport2.cpp, line 2597
[59393] WARNING: 'NS_FAILED(rv)', file /Users/andrei.eftimie/work/mozilla/gecko-dev/netwerk/protocol/http/nsHttpConnection.cpp, line 1638
[59393] WARNING: 'NS_FAILED(rv)', file /Users/andrei.eftimie/work/mozilla/gecko-dev/netwerk/protocol/http/nsHttpConnection.cpp, line 370
Can you check what's in those lines? We may have to set a break point there and do a detailed check.
| Reporter | ||
Comment 32•11 years ago
|
||
This is the line: http://dxr.mozilla.org/mozilla-central/source/netwerk/base/src/nsSocketTransport2.cpp#2597
But this is for the passing log. Since that was run under GDB, I am expecting some problems after the first test since gdb doesn't run subsequent tests after a restart.
| Reporter | ||
Comment 33•11 years ago
|
||
When this fail we the following items in the log right before the quit-application observer which we do not have when this passes:
> [59785] WARNING: NS_ENSURE_TRUE(mTextInputHandler) failed: file /Users/andrei.eftimie/work/mozilla/gecko-dev/widget/cocoa/nsChildView.mm, line 5305
> --DOMWINDOW == 21 (0x1196d9c00) [pid = 59785] [serial = 13] [outer = 0x0] [url = about:blank]
> --DOMWINDOW == 20 (0x12698f800) [pid = 59785] [serial = 14] [outer = 0x0] [url = about:home]
> --DOMWINDOW == 19 (0x1239a6400) [pid = 59785] [serial = 18] [outer = 0x0] [url = about:blank]
> --DOMWINDOW == 18 (0x1239a6800) [pid = 59785] [serial = 15] [outer = 0x0] [url = about:newtab]
> * Sending message: '{"eventType":"mozmill.pass","result":{"function":"assert.waitFor()"}}'
> * Set: '9ff4fac5-cee6-11e3-bc67-c42c03164de7'
> * Sending message: '{"result":true,"data":"bridge.registry[\"{3e299c49-bc89-214f-95ac-ec86935492dc}\"]","uuid":"9ff4fac5-cee6-11e3-bc67-c42c03164de7"}'
> --DOCSHELL 0x1235cf000 == 7 [pid = 59785] [id = 7]
> * Observer topic: 'quit-application'
Passes:
> [59497] WARNING: NS_ENSURE_TRUE(mTextInputHandler) failed: file /Users/andrei.eftimie/work/mozilla/gecko-dev/widget/cocoa/nsChildView.mm, line 5305
> * Sending message: '{"eventType":"mozmill.pass","result":{"function":"assert.waitFor()"}}'
> * Set: '941c4d7a-cee5-11e3-8769-c42c03164de7'
> * Sending message: '{"result":true,"data":"bridge.registry[\"{f7ba1372-6019-404c-a7f4-c5cb6f473464}\"]","uuid":"941c4d7a-cee5-11e3-8769-c42c03164de7"}'
> --DOCSHELL 0x118341000 == 7 [pid = 59497] [id = 7]
> * Observer topic: 'quit-application'
| Reporter | ||
Comment 34•11 years ago
|
||
Not sure how to debug this further. I'll make a minimized testcase to simplify the logs.
Comment 35•11 years ago
|
||
That may be better, yes. I could have a look into this then. Thanks!
| Reporter | ||
Comment 36•11 years ago
|
||
This can still be slimmed down a bit. I'll have a
Make sure to have proper paths to its dependencies.
Original location:
> firefox/tests/functional/restartTests/testAddons_installTheme/test1.js
On OSX 10.9 with a Nightly Debug build running:
> mozmill -t firefox/tests/functional/restartTests/testAddons_installTheme/test1.js -b /Applications/FirefoxNightlyDebug.app/ --profile=../profile/p1/
Yields a crash (crash only with a debug build, failure with an opt build) at roughly 30-50%.
Comment 37•11 years ago
|
||
I tried to reproduce this on windows, I couldn't in about 30 rans.
If I ran with a profile with geckoprofiler addon installed it fails 2-3 times in 25 rans with:
>console.error: geckoprofiler:
> Profiler module not found: Component returned failure code: 0x80520012 (NS_ERROR_FILE_NOT_FOUND) [nsIXPCComponents_Utils.import], undefined
>TEST-START | test1.js | setupModule
>TEST-START | test1.js | testInstallTheme
>TEST-PASS | test1.js | testInstallTheme
>TEST-START | test1.js | teardownModule
>TEST-END | test1.js | finished in 1451ms
>TEST-START | test2.js | setupModule
>TEST-START | test2.js | testThemeIsInstalled
>TEST-PASS | test2.js | testThemeIsInstalled
>TEST-START | test2.js | teardownModule
>TEST-END | test2.js | finished in 1375ms
>Parent process 2832 exited with children alive:
>PIDS: 1964
>Attempting to kill them...
>Parent process 2832 exited with children alive:
>PIDS: 1964
>Attempting to kill them...
>Error Code 6 trying to query IO Completion Port, exiting
>Exception in thread Thread-1:
>Traceback (most recent call last):
> File "C:\Users\svuser\Desktop\2.0.6-windows\mozmill-env\python\Lib\threading.py", line 551, in __bootstrap_inner
> self.run()
> File "C:\Users\svuser\Desktop\2.0.6-windows\mozmill-env\python\Lib\threading.py", line 504, in run
> self.__target(*self.__args, **self.__kwargs)
> File "c:\Users\svuser\Desktop\2.0.6-windows\mozmill-env\python\lib\site-packages\mozprocess\processhandler.py", line 321, in _procmgr
> self._poll_iocompletion_port()
> File "c:\Users\svuser\Desktop\2.0.6-windows\mozmill-env\python\lib\site-packages\mozprocess\processhandler.py", line 371, in _poll_iocompletion_port
> raise WinError(errcode)
>WindowsError: [Error 6] The handle is invalid.
>
>RESULTS | Passed: 2
>RESULTS | Failed: 0
>RESULTS | Skipped: 0
or
>console.error: geckoprofiler:
> Profiler module not found: Component returned failure code: 0x80520012 (NS_ERROR_FILE_NOT_FOUND) [nsIXPCComponents_Utils.import], undefined
>TEST-START | test1.js | setupModule
>TEST-START | test1.js | testInstallTheme
>TEST-PASS | test1.js | testInstallTheme
>TEST-START | test1.js | teardownModule
>TEST-END | test1.js | finished in 1474ms
>TEST-START | test2.js | setupModule
>TEST-START | test2.js | testThemeIsInstalled
>TEST-PASS | test2.js | testThemeIsInstalled
>TEST-START | test2.js | teardownModule
>TEST-END | test2.js | finished in 928ms
>RESULTS | Passed: 2
>RESULTS | Failed: 0
>RESULTS | Skipped: 0
>1398774567923 addons.xpi ERROR Failed to remove directory c:\Users\svuser\Desktop\newProfile\extensions\staged\mozmill@mozilla.com
>1398774567924 addons.xpi ERROR Failure moving c:\Users\svuser\Desktop\newProfile\extensions\staged\mozmill@mozilla.com to c:\Users\svuser\Desktop\newProfile\extensions
>1398774567927 addons.xpi ERROR Failed to install staged add-on mozmill@mozilla.com in app-profile
>console.error: geckoprofiler:
> Profiler module not found: Component returned failure code: 0x80520012 (NS_ERROR_FILE_NOT_FOUND) [nsIXPCComponents_Utils.import], undefined
>Timeout: bridge.set("e8d584a1-cf99-11e3-8cd7-08002796b112", Components.utils.import("resource://mozmill/driver/mozmill.js"))
>
>TEST-UNEXPECTED-FAIL | Disconnect Error: Application unexpectedly closed
>RESULTS | Passed: 0
>RESULTS | Failed: 0
>RESULTS | Skipped: 0
>Traceback (most recent call last):
> File "c:\Users\svuser\Desktop\2.0.6-windows\mozmill-env\python\lib\site-packages\mozmill\__init__.py", line 831, in run
> mozmill.run(tests, self.options.restart)
> File "c:\Users\svuser\Desktop\2.0.6-windows\mozmill-env\python\lib\site-packages\mozmill\__init__.py", line 429, in run
> frame = self.run_test_file(frame or self.start_runner(),
> File "c:\Users\svuser\Desktop\2.0.6-windows\mozmill-env\python\lib\site-packages\mozmill\__init__.py", line 347, in start_runner
> self.minidump_save_path = os.path.join(appinfo['paths']['appdata'],
>KeyError: 'paths'
| Reporter | ||
Comment 38•11 years ago
|
||
Running a new profile with the testcase showed no more failures.
Compared to the affected profile found the following pref was still active there:
> user_pref("security.dialog_enable_delay", 250);
We set this pref in almost every restart + addon test.
The default value is 1000. And I get no more jsbridge dc with the default value!
This was introduced in bug 923723
I also tested a 0 delay and got good results with that as well.
| Reporter | ||
Comment 39•11 years ago
|
||
This is used in 3 places:
Original XPI install confirm dialog:
http://dxr.mozilla.org/mozilla-central/source/toolkit/mozapps/extensions/content/xpinstallConfirm.js#27
Some uses in common dialogs:
http://dxr.mozilla.org/mozilla-central/source/toolkit/components/prompts/src/CommonDialog.jsm#166
Some download helper dialog:
http://dxr.mozilla.org/mozilla-central/source/toolkit/mozapps/downloads/nsHelperAppDlg.js#511
| Reporter | ||
Comment 40•11 years ago
|
||
Just a small update. This has the pref again (I had the pref saved in the profile before).
Basically comment 36.
Using a Nightly Debug version it should crash
> for i in {1..10}; do mozmill -m firefox/tests/functional/restartTests/testAddons_installTheme/manifest.ini -b /Applications/FirefoxNightlyDebug.app/ --profile=../profile/p13; done
test2 from the same folder is empty for me.
Attachment #8414455 -
Attachment is obsolete: true
Comment 27 looks to have been moved to bug 1003766
Flags: needinfo?(bent.mozilla)
| Reporter | ||
Comment 42•11 years ago
|
||
| Reporter | ||
Comment 43•11 years ago
|
||
From local testing this should alleviate the failures.
I propose to land this, and monitor the results. I'm not sure if this will reduce the JSBridge disconnects completely but it may greatly help.
Attachment #8416437 -
Flags: review?(andreea.matei)
Comment 44•11 years ago
|
||
Comment on attachment 8416437 [details] [diff] [review]
fix1_increase_delay.patch
Review of attachment 8416437 [details] [diff] [review]:
-----------------------------------------------------------------
We should really centralize those constants so we only have to call define it once. Also this is an infrastructure bug. You might want to fix this in a newly bug filed, where we can continue investigating this problem. Keep in mind that also update tests are failing and this changes are not related.
Comment 45•11 years ago
|
||
Comment on attachment 8416437 [details] [diff] [review]
fix1_increase_delay.patch
Review of attachment 8416437 [details] [diff] [review]:
-----------------------------------------------------------------
Please file a new bug and attach there. I would like to see this landed today, so we can get results over the weekend.
Attachment #8416437 -
Flags: review?(andreea.matei)
Updated•11 years ago
|
| Reporter | ||
Comment 46•11 years ago
|
||
And inbound pushlog:
http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=0de56f72315e&tochange=b8051da2a530
Well high chance that either this or the mc one is wrong (I've had low reproduce rates with older builds).
Currently last good inbound build has 40+ good runs, I usually see this once in a 20 run loop.
I'll rerun the last good build a few times more.
| Reporter | ||
Comment 47•11 years ago
|
||
100 more runs on last good inbound build. All good.
Comment 48•11 years ago
|
||
I would still favor in getting this fixed in Mozmill proper. Any chance to see it 100% failing with even smaller delays as 250ms in the addon installation dialog?
| Reporter | ||
Comment 49•11 years ago
|
||
I'll run more tests.
| Reporter | ||
Updated•11 years ago
|
Assignee: nobody → andrei.eftimie
The "worker is null" stack trace looks like bug 995162.
Comment 51•11 years ago
|
||
Ah, good to know. Thanks David. That at least reduces possible candidates. So bug 1005487 may still be a hot candidate for us here.
| Reporter | ||
Comment 52•11 years ago
|
||
Reducing priority since this is not affecting daily runs since we landed bug 1005035.
Priority: P1 → P2
Comment 53•11 years ago
|
||
Whenever we get this fixed, we should also revert the patch from bug 1005035.
Assignee: andrei.eftimie → nobody
Component: Infrastructure → Mozmill
Priority: P2 → --
Product: Mozilla QA → Testing
QA Contact: hskupin
Hardware: x86 → All
Whiteboard: [mozmill-2.1?]
Comment 54•11 years ago
|
||
Finally we have a reproducible testcase here, where it fails all the time on mm-osx-109-3:
http://mm-ci-master.qa.scl3.mozilla.com:8080/job/ondemand_functional/6634/console
I was watching the testrun via VNC and when we run this test, the browser seem to stall for a moment. Then the browser window is closing, but the application is still visible in the dock. After the jsbridge timeout the process is killed. So this specific issue doesn't seem to be a Mozmill bug.
We don't know yet, why we quit. Andrei, would you be able to check that tomorrow? Maybe it is reproducible even when you run it yourself on the box? If yes it would be good to modify mozmill to print out the exit code of Firefox. That might help us get more details about this mystic shutdown.
Flags: needinfo?(andrei.eftimie)
Comment 55•11 years ago
|
||
Oh wait. This geolocation test actually restarts the browser. So what we might face here is a really long shutdown time of Firefox which exceeds even 60s! This should really be investigated.
Comment 56•11 years ago
|
||
Andrei is on PTO so I'll do some investigation.
I can reproduce the jsbridge disconnect with only running that test
> mozmill -t firefox/tests/functional/testGeolocation/testNotNowShareLocation.js -b /Applications/Firefox.app/
Comment 57•11 years ago
|
||
The above test was made on the mm-osx-109-3 CI.
Comment 58•11 years ago
|
||
Ok, so lets spun this out to a new mozmill-test bug for investigation. If its a core bug we will need to file even one more.
Comment 59•11 years ago
|
||
Andrei, if we still see this please use the latest Mozmill code on master for investigation. Thanks.
Comment 60•11 years ago
|
||
There is nothing actionable on this bug. Even lately we haven't seen any large amount of disconnects anymore. So we seem to run very stable meanwhile. If we see something specific, we should file a new bug with exact details of the problem. This bug has been gotten more a meta bug.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → INCOMPLETE
Whiteboard: [mozmill-2.1?]
| Assignee | ||
Updated•9 years ago
|
Product: Testing → Testing Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•