Closed
Bug 732363
Opened 13 years ago
Closed 7 years ago
Intermittent (on Firefox) or Permanent (on Thunderbird Windows): TEST-UNEXPECTED-FAIL | test_net_addr.js | test failed (with xpcshell return code: 0), see following log from (NS_ERROR_NOT_CONNECTED) [nsISocketTransport.getScriptableSelfAddr]
Categories
(Core :: Networking, defect, P3)
Core
Networking
Tracking
()
RESOLVED
INCOMPLETE
mozilla14
Tracking | Status | |
---|---|---|
firefox13 | --- | verified |
People
(Reporter: standard8, Assigned: drice)
References
()
Details
(Keywords: intermittent-failure, regression, Whiteboard: [test disabled on Windows on mozilla13+; patch 603236 and 606194 fixed in mozilla13] [necko-backlog])
Attachments
(2 files, 1 obsolete file)
3.02 KB,
patch
|
jdm
:
review+
|
Details | Diff | Splinter Review |
843 bytes,
patch
|
jdm
:
review+
akeybl
:
approval-mozilla-aurora+
|
Details | Diff | Splinter Review |
Since bug 720846 landed, we've been seeing this failure on our Windows virtual machines, which are the Thunderbird builders:
http://tinderbox.mozilla.org/showlog.cgi?log=ThunderbirdTrunk/1330655617.1330658305.8294.gz#err0
TEST-INFO | e:\buildbot\comm-central-win32-opt-unittest-xpcshell\build\xpcshell\tests\netwerk\test\unit\test_net_addr.js | running test ...
TEST-UNEXPECTED-FAIL | e:\buildbot\comm-central-win32-opt-unittest-xpcshell\build\xpcshell\tests\netwerk\test\unit\test_net_addr.js | test failed (with xpcshell return code: 0), see following log:
>>>>>>>
TEST-INFO | (xpcshell/head.js) | test 1 pending
TEST-INFO | e:/buildbot/comm-central-win32-opt-unittest-xpcshell/build/xpcshell/tests/netwerk/test/unit/test_net_addr.js | server: listening on
TEST-INFO | (xpcshell/head.js) | test 2 pending
TEST-INFO | (xpcshell/head.js) | test 2 finished
TEST-INFO | (xpcshell/head.js) | running event loop
TEST-INFO | (xpcshell/head.js) | test 2 pending
TEST-INFO | e:/buildbot/comm-central-win32-opt-unittest-xpcshell/build/xpcshell/tests/netwerk/test/unit/test_net_addr.js | Starting testIpv4
TEST-INFO | (xpcshell/head.js) | test 2 finished
TEST-INFO | e:/buildbot/comm-central-win32-opt-unittest-xpcshell/build/xpcshell/tests/netwerk/test/unit/test_net_addr.js | server: got client connection
TEST-UNEXPECTED-FAIL | e:/buildbot/comm-central-win32-opt-unittest-xpcshell/build/xpcshell/tests/netwerk/test/unit/test_net_addr.js | in TestServer.onSocketAccepted - Unexpected exception [Exception... "Component returned failure code: 0x804b000c (NS_ERROR_NOT_CONNECTED) [nsISocketTransport.getScriptableSelfAddr]" nsresult: "0x804b000c (NS_ERROR_NOT_CONNECTED)" location: "JS frame :: e:/buildbot/comm-central-win32-opt-unittest-xpcshell/build/xpcshell/tests/netwerk/test/unit/test_net_addr.js :: <TOP_LEVEL> :: line 134" data: no], see following stack:
undefined
I'm not quite sure what is going on here, I don't see why it would have got that far and then not be connected...
Reporter | ||
Updated•13 years ago
|
Whiteboard: [tb-orange]
Assignee | ||
Comment 1•13 years ago
|
||
This isn't a problem with getScriptableSelfAddr. The error is coming from the non-scriptable getSelfAddr which it uses. I may have made some incorrect assumptions in my test case, but it seems to me that if the socket was accepted, and the output and input streams are open, we should not be seeing errors about not being connected. There's a chance that it disconnects (race condition) but if that were the case, this error would be sporadic.
This error is consistent, right?
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 3•13 years ago
|
||
Does seem to be permanent on Tb Windows, but then based on my memory from when Firefox ran tests on Win2K3 VMs, they are permanently brutally slow, and prone to losing any race condition you offer them.
Sort of surprising that the first Firefox failure would be a Win7 opt, though, typically WinXP debug is the closest match for the slowness of a 2K3 VM.
Blocks: 438871
Summary: Permanent Orange(?) Thunderbird Windows: TEST-UNEXPECTED-FAIL | test_net_addr.js | test failed (with xpcshell return code: 0), see following log → Intermittent (on Firefox) or Permanent (on Thunderbird Windows): TEST-UNEXPECTED-FAIL | test_net_addr.js | test failed (with xpcshell return code: 0), see following log from (NS_ERROR_NOT_CONNECTED) [nsISocketTransport.getScriptableSelfAddr]
Whiteboard: [tb-orange] → [tb-orange][orange]
Version: unspecified → Trunk
Comment 4•13 years ago
|
||
so we should backout 720846, right?
Assignee | ||
Comment 5•13 years ago
|
||
Sorry, I haven't been ignoring this; I've just been busy with other things. The test is failing, but I don't have any reason to believe that the code change is problematic. I'll provide a patch to back out the test and try to come back and create one that does not cause these failures.
jdm / biesi: Is there anything wrong with removing the test for the time being? Does this violate some "feature changes must have working tests" rule?
Comment 6•13 years ago
|
||
Can you instead disable the test just on Windows? Seems like that would fix this problem while still keeping some test coverage.
Assignee | ||
Comment 7•13 years ago
|
||
I think I figured out what's going on. I may propose a patch instead of removing it, but I'd like someone to chime in with "yeah, that sounds plausible" or something else to confirm my hypothesis is reasonable.
My assumption is that the result of transport.openOutputStream is being cleaned up because I am not keeping a reference to it. When that happens, the client side closes the connection.
My patch would be to hold onto the output stream in a var alongside |transport|.
Comment 8•13 years ago
|
||
That sounds incredibly likely. nsSocketTransport2::OpenOutputStream looks like it confirms this hypothesis, since in the blocking case it creates a pipe and doesn't store it anywhere but in the return value.
Assignee | ||
Comment 9•13 years ago
|
||
Patch attached. Verified that it still passes for me (though it always did) and I'll watch http://build.mozillamessaging.com/tinderboxpushlog/?tree=ThunderbirdTrunk for success.
jdm for review/checkin.
Attachment #602969 -
Flags: review?(josh)
Attachment #602969 -
Flags: checkin?(josh)
Comment 10•13 years ago
|
||
Comment on attachment 602969 [details] [diff] [review]
test fix 01
Review of attachment 602969 [details] [diff] [review]:
-----------------------------------------------------------------
This will have the same problem. The last reference to the stream disappears when the function returns; we need to store a reference to the stream outside of the function.
You can also leave off the checkin? flag in future; I'll ensure your patches are checked in.
Attachment #602969 -
Flags: review?(josh)
Attachment #602969 -
Flags: review-
Attachment #602969 -
Flags: checkin?(josh)
Assignee | ||
Comment 11•13 years ago
|
||
The inner function (set as |acceptCallback|) holds a reference to it. As long as that function is reachable, so is |output|. That function is only unreachable when |TestServer.reset| clears out the |acceptCallback|, and we no longer need the stream or socket.
Comment 12•13 years ago
|
||
I'm pretty confident that that's an implementation detail that could be defeated by optimizations in Spidermonkey. Let's just take the safe route and avoid any chance of oranges.
Assignee | ||
Comment 13•13 years ago
|
||
As requested. Passes locally (as always) and stores objects of importance in a global, which should alleviate the optimization concern.
Attachment #602969 -
Attachment is obsolete: true
Attachment #603236 -
Flags: review?(josh)
Assignee | ||
Updated•13 years ago
|
Attachment #603236 -
Attachment is patch: true
Updated•13 years ago
|
Attachment #603236 -
Flags: review?(josh) → review+
Comment 14•13 years ago
|
||
Comment 15•13 years ago
|
||
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla13
Comment 16•13 years ago
|
||
SeaMonkey Windows builds were unbroken and 1st run had this bug :-/
http://tinderbox.mozilla.org/showlog.cgi?log=SeaMonkey/1331235913.1331241194.13591.gz
WINNT 5.2 comm-central-trunk debug test xpcshell on 2012/03/08 11:45:13
s: cb-seamonkey-win32-03
{
WARNING: CheckLinkStatus called on main thread! No check performed. Assuming link is up, status is unknown.: file e:/builds/slave/comm-cen-trunk-w32-dbg/build/mozilla/netwerk/system/win32/nsNotifyAddrListener.cpp, line 414
TEST-INFO | e:/builds/slave/test/build/xpcshell/tests/netwerk/test/unit/test_net_addr.js | server: listening on
TEST-INFO | e:/builds/slave/test/build/xpcshell/tests/netwerk/test/unit/test_net_addr.js | Starting testIpv4
TEST-INFO | e:/builds/slave/test/build/xpcshell/tests/netwerk/test/unit/test_net_addr.js | server: got client connection
TEST-UNEXPECTED-FAIL | e:\builds\slave\test\build\xpcshell\tests\netwerk\test\unit\test_net_addr.js | test failed (with xpcshell return code: 0), see following log:
TEST-UNEXPECTED-FAIL | e:/builds/slave/test/build/xpcshell/tests/netwerk/test/unit/test_net_addr.js | in TestServer.onSocketAccepted - Unexpected exception [Exception... "Component returned failure code: 0x804b000c (NS_ERROR_NOT_CONNECTED) [nsISocketTransport.getScriptableSelfAddr]" nsresult: "0x804b000c (NS_ERROR_NOT_CONNECTED)" location: "JS frame :: e:/builds/slave/test/build/xpcshell/tests/netwerk/test/unit/test_net_addr.js :: <TOP_LEVEL> :: line 145" data: no], see following stack:
undefined
}
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 18•13 years ago
|
||
http://tinderbox.mozilla.org/showlog.cgi?log=ThunderbirdTry/1331689984.1331693837.25294.gz
WINNT 5.2 try-comm-central opt test xpcshell on 2012/03/13 18:53:04
Still happening: not (fully) fixed.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Comment 19•13 years ago
|
||
I need this test run on a build system which is currently failing, but for a debug build (or FORCE_PR_LOG) with NSPR_LOG_MODULES=nsSocketTransport:5 and then see the PR_LOG output. Ideally run solo, rather than with all tests.
Maybe I can get a windows thunderbird tryserver on loan?
I have not been able to reproduce this by introducing sleeps. So I'm a bit stumped if someone cannot help me with the above.
Assignee | ||
Comment 20•13 years ago
|
||
This is outside my expertise and ability to debug, specifically because of the Mozilla build infrastructure. biesi says we can always skip windows to avoid oranges, which this patch does.
I can't commit much more time to this issue, especially because I do not see the value. No one's been able to point out a flaw in the test, yet it fails anyway. The failure is unrelated to the changes it is trying to test, so it would be wasteful to throw away the change with the test. With this skip, we can at least see the advantage of the original change (nsNetAddr).
Attachment #606194 -
Flags: review?(josh)
Comment 21•13 years ago
|
||
Comment on attachment 606194 [details] [diff] [review]
skip windows
[Checked in: Comment 25 & 30]
Review of attachment 606194 [details] [diff] [review]:
-----------------------------------------------------------------
Agreed on all counts. Thanks for the work you've sunk into this!
Attachment #606194 -
Flags: review?(josh) → review+
Comment 22•13 years ago
|
||
Comment 23•13 years ago
|
||
While here, some nits:
32 do_print('server: listening on', this.port);
shoule be
32 do_print("server: listening on port " + this.port);
139 }
shoule be
139 };
178 testDataStore.output = testDataStore.transport.openOutputStream(Ci.nsITransport.OPEN_BLOCKING,0,0);
shoule be
178 testDataStore.output = testDataStore.transport.openOutputStream(Ci.nsITransport.OPEN_BLOCKING, 0, 0);
163 do_execute_soon(run_next_test);
run_next_test() already does do_execute_soon(_run_next_test).
Does it need to be doubled? (If so, a comment might be welcome.)
***
49 this.selfAddr = trans.getScriptableSelfAddr();
50 this.peerAddr = trans.getScriptablePeerAddr();
succeed,
then
52 this.acceptCallback();
...
141 serv.acceptCallback = function() {
142 // disable the timeoutCallback
143 serv.timeoutCallback = function(){};
144
145 var selfAddr = testDataStore.transport.getScriptableSelfAddr();
fails.
(which is odd, as all these are supposed to happen synchronously.)
(Trying to guess, facts and code...)
Fwiw, should you pass 'trans' and add something like
do_check_true(trans.isAlive());
do_check_eq(trans.getScriptableSelfAddr(), serv.selfAddr);
do_check_true(testDataStore.transport.isAlive());
do_check_eq(trans, testDataStore.transport);
? (At least as a debug try.)
Should you do_print() in
63 onStopListening: function(socket) {} ,
fwiw?
Comment 24•13 years ago
|
||
(In reply to Serge Gautherie (:sgautherie) from comment #23)
> Fwiw, should you pass 'trans' and add something like
> do_check_true(trans.isAlive());
> do_check_eq(trans.getScriptableSelfAddr(), serv.selfAddr);
> do_check_true(testDataStore.transport.isAlive());
> do_check_eq(trans, testDataStore.transport);
> ? (At least as a debug try.)
And probably add "same" checks in onSocketAccepted(), so we have even more details.
***
Afaict this is still Windows (near-)perma-orange on both Dbg SeaMonkey and Opt Thunderbird.
I can understand/support disabling the test (after we run out of (debug) idea).
Yet, the issue should be easily reproducible on TB Try and I don't think we should give up (as in closing bug) on finding out what is happening as this result seems very unexpected.
Comment 25•13 years ago
|
||
Comment 26•13 years ago
|
||
http://tinderbox.mozilla.org/showlog.cgi?log=ThunderbirdTrunk/1331908000.1331910589.8032.gz
WINNT 5.2 comm-central test xpcshell on 2012/03/16 07:26:40
+
http://tinderbox.mozilla.org/showlog.cgi?log=SeaMonkey/1331918260.1331923715.3372.gz
WINNT 5.2 comm-central-trunk debug test xpcshell on 2012/03/16 10:17:40
V.Fixed wrt to this disabling patch (only).
Whiteboard: [tb-orange][orange] → [test disabled on mozilla14; patch 603236 fixed in mozilla13] [tb-orange][orange]
Comment 27•13 years ago
|
||
Comment on attachment 606194 [details] [diff] [review]
skip windows
[Checked in: Comment 25 & 30]
[Approval Request Comment]
Regression caused by (bug #): Bug 720846.
User impact if declined: None, but perma-orange on SeaMonkey and Thunderbird.
Testing completed (on m-c, etc.): m-i, m-c.
Risk to taking this patch (and alternatives if risky): None, test-only.
String changes made by this patch: None.
Attachment #606194 -
Flags: approval-mozilla-aurora?
Updated•13 years ago
|
Attachment #603236 -
Attachment description: test fix 02 → test fix 02
[Checked in: Comment 15]
Updated•13 years ago
|
Attachment #606194 -
Attachment description: skip windows → skip windows
[Checked in: Comment 25]
Comment 28•13 years ago
|
||
Comment on attachment 606194 [details] [diff] [review]
skip windows
[Checked in: Comment 25 & 30]
[Triage Comment]
Bug 720846 was first in Firefox as of 13, so if disabling this test is the right path forward, we should do the same on Aurora.
Attachment #606194 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment hidden (Legacy TBPL/Treeherder Robot) |
Updated•13 years ago
|
Keywords: checkin-needed
Whiteboard: [test disabled on mozilla14; patch 603236 fixed in mozilla13] [tb-orange][orange] → [c-n: 6801ef0d83b8 to m-a] [test disabled on mozilla14; patch 603236 fixed in mozilla13] [tb-orange][orange]
Comment 30•13 years ago
|
||
Landed on Aurora.
https://hg.mozilla.org/releases/mozilla-aurora/rev/8b1899e39fa7
Keywords: checkin-needed
Whiteboard: [c-n: 6801ef0d83b8 to m-a] [test disabled on mozilla14; patch 603236 fixed in mozilla13] [tb-orange][orange] → [test disabled on mozilla13/14; patch 603236 fixed in mozilla13] [tb-orange][orange]
Updated•13 years ago
|
Attachment #606194 -
Attachment description: skip windows
[Checked in: Comment 25] → skip windows
[Checked in: Comment 25 & 30]
Updated•13 years ago
|
status-firefox13:
--- → fixed
Whiteboard: [test disabled on mozilla13/14; patch 603236 fixed in mozilla13] [tb-orange][orange] → [test disabled on mozilla13+; patch 603236 and 606194 fixed in mozilla13] [tb-orange][orange]
Target Milestone: mozilla13 → mozilla14
Comment 31•13 years ago
|
||
http://tinderbox.mozilla.org/showlog.cgi?log=SeaMonkey-Aurora/1333835055.1333840344.14793.gz
WINNT 5.2 comm-aurora debug test xpcshell on 2012/04/07 14:44:15
firefox13: verified, wrt disabling the test.
Updated•12 years ago
|
Keywords: intermittent-failure
Updated•12 years ago
|
Whiteboard: [test disabled on mozilla13+; patch 603236 and 606194 fixed in mozilla13] [tb-orange][orange] → [test disabled on mozilla13+; patch 603236 and 606194 fixed in mozilla13]
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 46•12 years ago
|
||
Ftr,
Comment 32 to comment 45 are all on "Linux".
This seems a "rare" intermittent-failure there : no permanent orange on TB (or SM).
Whiteboard: [test disabled on mozilla13+; patch 603236 and 606194 fixed in mozilla13] → [test disabled on Windows on mozilla13+; patch 603236 and 606194 fixed in mozilla13]
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Updated•9 years ago
|
Whiteboard: [test disabled on Windows on mozilla13+; patch 603236 and 606194 fixed in mozilla13] → [test disabled on Windows on mozilla13+; patch 603236 and 606194 fixed in mozilla13] [necko-backlog]
Comment hidden (Intermittent Failures Robot) |
Comment 60•9 years ago
|
||
Bulk assigning P3 to all open intermittent bugs without a priority set in Firefox components per bug 1298978.
Priority: -- → P3
Comment 61•7 years ago
|
||
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: P3 → P1
Comment 62•7 years ago
|
||
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: P1 → P3
Comment 63•7 years ago
|
||
no instances in 4 months, lets close this out
Status: REOPENED → RESOLVED
Closed: 13 years ago → 7 years ago
Resolution: --- → INCOMPLETE
You need to log in
before you can comment on or make changes to this bug.
Description
•