If you think a bug might affect users in the 57 release, please set the correct tracking and status flags for Release Management.

Intermittent dom/network/tests/test_tcpsocket_client_and_server_basics.html, dom/network/tests/test_tcpsocket_jsm.html | Test timed out.

RESOLVED FIXED in Firefox 51

Status

()

Core
DOM
P3
normal
RESOLVED FIXED
a year ago
a year ago

People

(Reporter: Treeherder Bug Filer, Assigned: wlach)

Tracking

({intermittent-failure})

unspecified
mozilla51
intermittent-failure
Points:
---

Firefox Tracking Flags

(firefox51 fixed)

Details

MozReview Requests

()

Submitter Diff Changes Open Issues Last Updated
Loading...
Error loading review requests:

Attachments

(1 attachment)

(Reporter)

Description

a year ago
treeherder
Filed by: philringnalda [at] gmail.com

https://treeherder.mozilla.org/logviewer.html#?job_id=33886059&repo=mozilla-inbound

https://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-inbound-macosx64/1471291096/mozilla-inbound_yosemite_r7_test-mochitest-other-bm135-tests1-macosx-build301.txt.gz

Comment 1

a year ago
25 automation job failures were associated with this bug in the last 7 days.

Repository breakdown:
* autoland: 13
* mozilla-inbound: 7
* fx-team: 4
* mozilla-central: 1

Platform breakdown:
* linux64: 17
* windows7-32: 6
* osx-10-10: 2

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1295453&startday=2016-08-15&endday=2016-08-21&tree=all
This is pretty frequent and tends to cause failures in subsequent tests as well. Any chance you could take a look, Josh?
Flags: needinfo?(josh)
Looks like the same problem as bug 1295483.

Updated

a year ago
Blocks: 1104156
Flags: needinfo?(josh)

Comment 4

a year ago
15 automation job failures were associated with this bug yesterday.

Repository breakdown:
* mozilla-inbound: 5
* mozilla-central: 5
* autoland: 3
* try: 1
* fx-team: 1

Platform breakdown:
* linux64: 11
* osx-10-10: 3
* windows7-32: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1295453&startday=2016-08-22&endday=2016-08-22&tree=all

Comment 5

a year ago
19 automation job failures were associated with this bug yesterday.

Repository breakdown:
* mozilla-inbound: 7
* autoland: 6
* fx-team: 3
* mozilla-central: 2
* try: 1

Platform breakdown:
* linux64: 11
* osx-10-10: 5
* windows7-32: 2
* windows8-64: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1295453&startday=2016-08-23&endday=2016-08-23&tree=all

Comment 6

a year ago
16 automation job failures were associated with this bug yesterday.

Repository breakdown:
* mozilla-inbound: 9
* autoland: 6
* fx-team: 1

Platform breakdown:
* linux64: 13
* windows7-32: 2
* osx-10-10: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1295453&startday=2016-08-26&endday=2016-08-26&tree=all

Comment 7

a year ago
72 automation job failures were associated with this bug in the last 7 days.

Repository breakdown:
* autoland: 27
* mozilla-inbound: 26
* fx-team: 8
* mozilla-central: 7
* try: 4

Platform breakdown:
* linux64: 51
* osx-10-10: 12
* windows7-32: 7
* windows8-64: 2

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1295453&startday=2016-08-22&endday=2016-08-28&tree=all

Comment 8

a year ago
17 automation job failures were associated with this bug yesterday.

Repository breakdown:
* mozilla-inbound: 9
* autoland: 7
* fx-team: 1

Platform breakdown:
* linux64: 13
* windows7-32: 3
* osx-10-10: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1295453&startday=2016-08-29&endday=2016-08-29&tree=all
FWIW, I'm looking into this. In the examples I've looked at, it seems like the problem is that a socket is being closed before the event we're expecting happens, and then we eventually just time out waiting for it to happen (which it never does). Not really sure why this would be the case, it's obviously *not supposed to happen* and nothing we're explicitly doing should trigger a close.

Next steps:

1. See if we can reproduce this on a Linux machine
2. See if we can reproduce this under rr.

It would indeed not surprise me if this was related to bug 1295483 or possibly other tcp oranges as well, but this seems like one of the simpler manifestations of what (might) be a larger problem, so let's start here.
Ok, I managed to reproduce this in rr (using chaos mode), though in retrospect I probably didn't need to: the bug is pretty simple, in some cases we close immediately before writing *any* data to the socket, so we wait for that event forever. There's no guarantee even 1 byte will be written, so we shouldn't expect it to. It should be sufficient to just wait for the connection to close, then to verify that a full set of data wasn't written (in theory this could still intermittently fail depending on timing, but the chances of that are infinitesimally small)
Assignee: nobody → wlachance
Comment hidden (mozreview-request)
Note that the above patch basically reverts a change that :jdm requested in https://bugzilla.mozilla.org/show_bug.cgi?id=1104156#c28 ("We should first use serverQueue.waitForDataWithAtLeastLength - we want to verify that at least 1 byte is received before the close message, but we should also verify that the received data is less than the length of the two big arrays. Does that make sense?"), but I'm pretty sure that isn't the right thing to assert on. Would we expect that behaviour most of the time? Yes. Is it an error if we don't see it? No.

(unless I'm missing something)
Summary: Intermittent dom/network/tests/test_tcpsocket_client_and_server_basics.html | Test timed out. → Intermittent dom/network/tests/test_tcpsocket_client_and_server_basics.html, dom/network/tests/test_tcpsocket_jsm.html | Test timed out.
Duplicate of this bug: 1295483
20 automation job failures were associated with this bug yesterday.

Repository breakdown:
* mozilla-inbound: 9
* autoland: 7
* fx-team: 2
* try: 1
* mozilla-central: 1

Platform breakdown:
* linux64: 17
* osx-10-10: 2
* windows7-32: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1295453&startday=2016-08-31&endday=2016-08-31&tree=all

Comment 15

a year ago
Bulk assigning P3 to all open intermittent bugs without a priority set in Firefox components per bug 1298978.
Priority: -- → P3
23 automation job failures were associated with this bug yesterday.

Repository breakdown:
* autoland: 10
* mozilla-inbound: 7
* fx-team: 3
* mozilla-central: 2
* oak: 1

Platform breakdown:
* linux64: 19
* osx-10-10: 3
* windows7-32: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1295453&startday=2016-09-01&endday=2016-09-01&tree=all
16 automation job failures were associated with this bug yesterday.

Repository breakdown:
* mozilla-inbound: 13
* mozilla-central: 1
* fx-team: 1
* autoland: 1

Platform breakdown:
* linux64: 12
* osx-10-10: 3
* windows7-32: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1295453&startday=2016-09-02&endday=2016-09-02&tree=all
107 automation job failures were associated with this bug in the last 7 days.

Repository breakdown:
* mozilla-inbound: 44
* autoland: 40
* fx-team: 12
* mozilla-central: 7
* try: 2
* oak: 2

Platform breakdown:
* linux64: 82
* osx-10-10: 17
* windows7-32: 8

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1295453&startday=2016-08-29&endday=2016-09-04&tree=all
15 automation job failures were associated with this bug yesterday.

Repository breakdown:
* mozilla-inbound: 9
* autoland: 4
* mozilla-central: 1
* fx-team: 1

Platform breakdown:
* linux64: 10
* windows7-32: 3
* osx-10-10: 2

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1295453&startday=2016-09-05&endday=2016-09-05&tree=all

Comment 20

a year ago
mozreview-review
Comment on attachment 8786850 [details]
Bug 1295453 - Don't wait for data to be sent when closing socket immediately

https://reviewboard.mozilla.org/r/75734/#review74948

::: dom/network/tests/test_tcpsocket_client_and_server_basics.js:382
(Diff revision 1)
> -  is(serverReceived.length < (2 * bigUint8Array.length), true, 'Received array length less than sent array length');
> -
>    is((yield serverQueue.waitForEvent()).type, 'close',
>       'The close event is received after calling closeImmediately');
>  
> +  is(serverReceived.length < (2 * bigUint8Array.length), true, 'Received array length less than sent array length');

This serverReceived value is unrelated to the sends that happened in this particular test, so we can't assert anything about this case now.

What if we make a waitForAnyDataOrClose helper which uses Promise.race to catch both cases, and treat the close case as a length of 0?

Updated

a year ago
Attachment #8786850 - Flags: review?(josh) → review-
Comment hidden (mozreview-request)
(Assignee)

Comment 22

a year ago
mozreview-review-reply
Comment on attachment 8786850 [details]
Bug 1295453 - Don't wait for data to be sent when closing socket immediately

https://reviewboard.mozilla.org/r/75734/#review74948

> This serverReceived value is unrelated to the sends that happened in this particular test, so we can't assert anything about this case now.
> 
> What if we make a waitForAnyDataOrClose helper which uses Promise.race to catch both cases, and treat the close case as a length of 0?

Ah right, however I don't think we want Promise.race here, since we really do just want the close message to be called, at least as far as I understand things. I think this updated patch should work, let me know if I'm missing something.
Comment hidden (mozreview-request)

Comment 24

a year ago
mozreview-review
Comment on attachment 8786850 [details]
Bug 1295453 - Don't wait for data to be sent when closing socket immediately

https://reviewboard.mozilla.org/r/75734/#review75264

Thanks!
Attachment #8786850 - Flags: review?(josh) → review+
29 automation job failures were associated with this bug yesterday.

Repository breakdown:
* autoland: 14
* mozilla-inbound: 7
* try: 4
* fx-team: 3
* mozilla-central: 1

Platform breakdown:
* linux64: 26
* osx-10-10: 2
* windows7-32: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1295453&startday=2016-09-06&endday=2016-09-06&tree=all
try run looks happy, I'm going to try to land:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=417cc6083b7e

(missing data for runs on other platforms which I hoped to have probably because my try syntax is wrong, but I'm still pretty confident this won't break stuff)

Comment 27

a year ago
Pushed by wlachance@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/3d0b41fdd93b
Don't wait for data to be sent when closing socket immediately r=jdm

Comment 28

a year ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/3d0b41fdd93b
Status: NEW → RESOLVED
Last Resolved: a year ago
status-firefox51: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla51
54 automation job failures were associated with this bug in the last 7 days.

Repository breakdown:
* mozilla-inbound: 22
* autoland: 18
* try: 7
* fx-team: 5
* mozilla-central: 2

Platform breakdown:
* linux64: 43
* osx-10-10: 6
* windows7-32: 5

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1295453&startday=2016-09-05&endday=2016-09-11&tree=all

Updated

a year ago
Duplicate of this bug: 1295625
You need to log in before you can comment on or make changes to this bug.