quitApplication takes a long time; mutt python tests broken on master

RESOLVED FIXED

Status

Testing Graveyard
Mozmill
RESOLVED FIXED
6 years ago
a year ago

People

(Reporter: Jeff Hammel, Unassigned)

Tracking

Details

Attachments

(2 attachments)

(Reporter)

Description

6 years ago
running `mutt test.py` on an up to date master HEAD I get:

INFO | 
INFO | (<testapi.TestMozmillAPI testMethod=test_api>, 'Traceback (most
recent call last):\n  File
"/home/jhammel/tmp/src/mozmill/mutt/mutt/tests/python/testapi.py",
line 24, in test_api\n    self.assertTrue(len(results.passes) ==
passes)\nAssertionError: False is not true\n')
INFO | 
INFO | (<testpersisted.TestMozmillPersisted
testMethod=test_persisted>, 'Traceback (most recent call last):\n
File
"/home/jhammel/tmp/src/mozmill/mutt/mutt/tests/python/testpersisted.py",
line 38, in test_persisted\n    self.assertTrue(len(results.passes) ==
1)\nAssertionError: False is not true\n')
INFO | 
INFO | (<testpersisted.TestMozmillPersisted
testMethod=test_persisted_shutdown>, 'Traceback (most recent call
last):\n  File
"/home/jhammel/tmp/src/mozmill/mutt/mutt/tests/python/testpersisted.py",
line 48, in test_persisted_shutdown\n
self.assertTrue(len(results.passes) == 1)\nAssertionError: False is
not true\n')
INFO | 
INFO | (<testmultiplerun.TestMozmillAPI testMethod=test_runtwice>,
'Traceback (most recent call last):\n  File
"/home/jhammel/tmp/src/mozmill/mutt/mutt/tests/python/testmultiplerun.py",
line 25, in test_runtwice\n    self.assertTrue(len(results.passes) ==
passes)\nAssertionError: False is not true\n')
INFO | 
INFO | (<test_bug690154.TestBug690154 testMethod=test_JSON_structure>,
'Traceback (most recent call last):\n  File
"/home/jhammel/tmp/src/mozmill/mutt/mutt/tests/python/test_bug690154.py",
line 34, in test_JSON_structure\n    self.assertTrue(fails[\'name\']
== \'<TOP_LEVEL>\')\nKeyError: \'name\'\n')

In addition I see some:

Timeout: bridge.execFunction("80fbbdf0-fc1e-11e0-be5f-00231456cca4",
bridge.registry["{2e6afad5-d7a1-4ade-b0f6-e6f028da2917}"]["cleanQuit"],
[])
(Reporter)

Comment 1

6 years ago
Looking at testpersisted.py, test_persisted_shutdown fails like:

(Pdb) results.passes
[]
(Pdb) results.fails
[{'passes': [], 'fails': [{'exception': {'message': 'Disconnect Error: Application unexpectedly closed'}}], 'passed': 0, 'failed': 1}]
(Reporter)

Comment 2

6 years ago
The persisted data also does not get updated:

m.persisted
{'foo': 'bar', 'bar': 'foo', 'number': 1}

Should be:

{u'fleem': 2, u'foo': u'bar', u'bar': u'bar', u'number': 2}
(Reporter)

Comment 3

6 years ago
Created attachment 568758 [details]
timeout

So this does not fail, but gives a jsbridge timeout, which I assume is related:

TEST-START | test_bad.js | setupModule
TEST-START | test_bad.js | test_something
TEST-PASS | test_bad.js | test_something
INFO | Timeout: bridge.execFunction("e90340a6-fc20-11e0-9590-00231456cca4", bridge.registry["{815bf0ab-c41a-4464-b342-aea11cfc4cf3}"]["runTestFile"], ["/home/jhammel/tmp/test_bad.js", false, null])
INFO | 
INFO | Passed: 1
INFO | Failed: 0
INFO | Skipped: 0
(Reporter)

Comment 4

6 years ago
so in a different mozmill checkout, i don't get any of these errors.  The trees should be in an identical state, but obviously something wonky has happened.  I will close, rmtree, and hope not to see this again
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → WORKSFORME
(Reporter)

Comment 5

6 years ago
This has reappeared for me (again, mysteriously) with a fresh checkout.  Will investigate when this becomes a blocker
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
(Reporter)

Comment 6

6 years ago
Still seeing this today. I have no idea what is going on :/
(Reporter)

Comment 7

6 years ago
Note also that it takes a long time to shutdown the tests.  Usually, the test run on a slower computer takes 10-15 minutes.  On this computer it took more like 40 minutes.  I get a lot of lines like:

INFO | Timeout: bridge.execFunction("37e5229a-04ad-11e1-9f54-00262df16844", bridge.registry["{185a4a90-ec6f-4b86-8f2d-d88f4e4ff7e1}"]["cleanQuit"], [])

It takes about 1:30 each time it shuts down :(
(Reporter)

Comment 8

6 years ago
I also get a lot of:

INFO | 
INFO | *** e = [Exception... "Component returned failure code: 0x80570016 (NS_ERROR_XPC_GS_RETURNED_FAILURE) [nsIJSCID.getService]"  nsresult: "0x80570016 (NS_ERROR_XPC_GS_RETURNED_FAILURE)"  location: "JS frame :: chrome://browser/content/utilityOverlay.js :: getShellService :: line 349"  data: no]
INFO | 
Xlib:  extension "GLX" missing on display ":0.0".
INFO | Timeout: bridge.set("78315d3a-04d2-11e1-b3d7-0024d7065b60", Components.utils.import('resource://mozmill/driver/mozmill.js'))
INFO | 
INFO | Timeout: bridge.set("78caaaf8-04d2-11e1-b3d7-0024d7065b60", Components.utils.import('resource://mozmill/driver/mozmill.js'))
INFO | 
INFO | Xlib:  extension "GLX" missing on display ":0.0".
INFO | 
INFO | *** e = [Exception... "Component returned failure code: 0x80570016 (NS_ERROR_XPC_GS_RETURNED_FAILURE) [nsIJSCID.getService]"  nsresult: "0x80570016 (NS_ERROR_XPC_GS_RETURNED_FAILURE)"  location: "JS frame :: chrome://browser/content/utilityOverlay.js :: getShellService :: line 349"  data: no]
(Reporter)

Updated

6 years ago
Depends on: 704023
(Reporter)

Updated

6 years ago
No longer depends on: 704023
(Reporter)

Comment 9

6 years ago
So testing on another computer which had always passed before (also linux, also running nightly) I now get the same results.  I have no idea why one computer would have shown symptoms and the other not initially -- possibly a side effect of an Ubuntu update?

Does anyone on nightly, mozmill master or hotfix-2.0 (it is an issue with both, haven't looked into 1.5.x much as it does not have tests), and an up-to-date Ubuntu 11.10 machine -not- see this?  It looks, from casual inspection, like something is wrong with quitApplication.  The JS tests will -apparently- pass (though we don't have rigorous accounting for this -- they don't obviously fail at any rate).  The python tests won't pass, with my hypothesis that something is getting killed before data makes it across the bridge (e.g. a race condition) so the results are not correctly gathered.  ABICT, there was no significant change of mozmill code associated with this breakage.  On the newly installed ubuntu 11.10 computer, it occured as soon it was installed (and never on 11.04).  On this computer, upgraded to 11.10, it initially worked after the upgrade (and always before) but now is having problems.
OS: All → Linux
Summary: mutt python tests broken on master → quitApplication takes a long time; mutt python tests broken on master
Jeff, it would be helpful if you could check with which version of Nightly it has been started. If it's a regression in Firefox we need that range. Not sure if this could be related but lately there was the landing of some silent update patches. Those could have an effect. But I need more information from your side first.
(Reporter)

Comment 11

6 years ago
I can't be particularly precise about the regression range without bisecting.  Both ubuntu 11.10 computers were upgraded a week after the ubuntu release and both run Firefox nightly.  None of them had symptoms before and one of them has only shown symptoms recently.  I also have another computer neither on nightly nor ubuntu 11.10. I will try to bisect this when I have the time, but I am curious if this is a universal result or not.
Does it happen with Aurora, Beta, or Release builds? That's the easiest check you could do to proof if it's a regression in Firefox or not.
(Reporter)

Comment 13

6 years ago
This is still an issue that (probably among other things) causing tests to fail on my computer.
(Reporter)

Comment 14

6 years ago
Interestingly....this does not occur on another computer, also linux/ubuntu.  o_O I'm a bit perplexed on what to do about this, but for the time being I'm going to make sure everything is sane on the box where it works.  The computer that it fails on is much much faster, so there could be some evil race condition, but that is a wild guess.  The versions of nightly and mozmill are the same and the environment is very close to the same (as close as can be for computers with different architectures).  If anyone wants any more debugging info, please let me know
(Reporter)

Updated

6 years ago
Duplicate of this bug: 740449

Comment 16

6 years ago
Created attachment 626397 [details] [diff] [review]
Hacky Fix for crash

So, this isn't a timeout, this is a crash. It only happens on linux. And as best I can tell, we are crashing in jsbridge because we are still sending status back and forth while firefox is shutting down. It appears to be a timeout on a release build, but in a debug build you can see the stack from the abort that we hit in nspr. Oddly, if we just wait a bit on shutdown then we don't see this.  But, if we use observers to capture when we are shutting down and try to close the bridge (which would seem to be the right fix) we still hit the crash.

So, this patch solves the issue, but it is hacky.
Attachment #626397 - Flags: review?(jhammel)
(Reporter)

Comment 17

6 years ago
Comment on attachment 626397 [details] [diff] [review]
Hacky Fix for crash

Looks good!  I've ran at least half a dozen times on a machine where this was a problem and I consistently get success \o/
Attachment #626397 - Flags: review?(jhammel) → review+

Comment 18

6 years ago
Landed hacky fix for now. https://github.com/mozautomation/mozmill/commit/5564d4f6b4f493492ddeb462a61d55f1a1487fcd
Status: REOPENED → RESOLVED
Last Resolved: 6 years ago6 years ago
Resolution: --- → FIXED
(Assignee)

Updated

a year ago
Product: Testing → Testing Graveyard
You need to log in before you can comment on or make changes to this bug.