Intermittent test_dns_cancel_wrap.js | test failed (with xpcshell return code: 0) | 0 == 2147500036

RESOLVED FIXED in Firefox 34

Status

()

defect
RESOLVED FIXED
5 years ago
5 years ago

People

(Reporter: RyanVM, Assigned: dragana)

Tracking

({intermittent-failure})

Trunk
mozilla36
x86_64
Linux
Points:
---

Firefox Tracking Flags

(firefox34 fixed, firefox35 fixed, firefox36 fixed, firefox-esr31 unaffected)

Details

Attachments

(1 attachment, 1 obsolete attachment)

https://treeherder.mozilla.org/ui/logviewer.html#?job_id=2948021&repo=mozilla-inbound

builder 	Ubuntu ASAN VM 12.04 x64 mozilla-inbound opt test xpcshell
buildid 	20141012162554
builduid 	9bab84cfb9aa4cf1926776b576a431ff
results 	warnings (1)
revision 	facebebfea4d
slave 		tst-linux64-spot-398
starttime 	Sun Oct 12 2014 20:14:38 GMT-0400 (Eastern Standard Time)

18:27:41 INFO - TEST-INFO | /builds/slave/test/build/tests/xpcshell/tests/netwerk/test/unit_ipc/test_dns_cancel_wrap.js | running test ...
18:27:42 WARNING - TEST-UNEXPECTED-FAIL | /builds/slave/test/build/tests/xpcshell/tests/netwerk/test/unit_ipc/test_dns_cancel_wrap.js | test failed (with xpcshell return code: 0), see following log:
18:27:42 INFO - >>>>>>>
18:27:42 INFO - TEST-INFO | (xpcshell/head.js) | test MAIN run_test pending (1)
18:27:42 INFO - parent: TEST-INFO | (xpcshell/head.js) | test run in child pending (2)
18:27:42 INFO - parent: TEST-INFO | (xpcshell/head.js) | test MAIN run_test finished (2)
18:27:42 INFO - parent: TEST-INFO | (xpcshell/head.js) | running event loop
18:27:42 INFO - Xlib: extension "RANDR" missing on display ":0".
18:27:42 INFO - child: CHILD-TEST-STARTED
18:27:42 INFO - child: TEST-INFO | (xpcshell/head.js) | test MAIN run_test pending (1)
18:27:42 INFO - child: TEST-INFO | (xpcshell/head.js) | test pending (2)
18:27:42 INFO - child: TEST-INFO | (xpcshell/head.js) | test pending (3)
18:27:42 INFO - child: TEST-INFO | (xpcshell/head.js) | test pending (4)
18:27:42 INFO - child: TEST-INFO | (xpcshell/head.js) | test pending (5)
18:27:42 INFO - child: TEST-INFO | (xpcshell/head.js) | test pending (6)
18:27:42 INFO - child: TEST-INFO | (xpcshell/head.js) | test MAIN run_test finished (6)
18:27:42 INFO - child: TEST-INFO | (xpcshell/head.js) | running event loop
18:27:42 INFO - child: TEST-UNEXPECTED-FAIL | /builds/slave/test/build/tests/xpcshell/tests/netwerk/test/unit/test_dns_cancel.js | 0 == 2147500036 - See following stack:
18:27:42 INFO - /builds/slave/test/build/tests/xpcshell/tests/netwerk/test/unit/test_dns_cancel.js:listener1.onLookupComplete:19
18:27:42 INFO - /builds/slave/test/build/tests/xpcshell/head.js:_do_main:191
18:27:42 INFO - /builds/slave/test/build/tests/xpcshell/head.js:_execute_test:405
18:27:42 INFO - typein:null:0
18:27:42 INFO - null:null:0
18:27:42 INFO - child: TEST-INFO | (xpcshell/head.js) | exiting test
18:27:42 INFO - child: TEST-UNEXPECTED-FAIL | /builds/slave/test/build/tests/xpcshell/tests/netwerk/test/unit/test_dns_cancel.js | 0 == 2147500036 - See following stack:
18:27:42 INFO - /builds/slave/test/build/tests/xpcshell/tests/netwerk/test/unit/test_dns_cancel.js:listener2.onLookupComplete:43
18:27:42 INFO - /builds/slave/test/build/tests/xpcshell/head.js:_do_main:194
18:27:42 INFO - /builds/slave/test/build/tests/xpcshell/head.js:_execute_test:405
18:27:42 INFO - typein:null:0
18:27:42 INFO - null:null:0
18:27:42 INFO - child: TEST-INFO | (xpcshell/head.js) | exiting test
18:27:42 INFO - child: TEST-PASS | /builds/slave/test/build/tests/xpcshell/tests/netwerk/test/unit/test_dns_cancel.js | [listener1.onLookupComplete : 24] 0 != 2147500036
18:27:42 INFO - child: TEST-INFO | (xpcshell/head.js) | test finished (5)
18:27:42 INFO - child: TEST-UNEXPECTED-FAIL | /builds/slave/test/build/tests/xpcshell/tests/netwerk/test/unit/test_dns_cancel.js | 0 == 2147500036 - See following stack:
18:27:42 INFO - /builds/slave/test/build/tests/xpcshell/tests/netwerk/test/unit/test_dns_cancel.js:listener1.onLookupComplete:19
18:27:42 INFO - /builds/slave/test/build/tests/xpcshell/head.js:_do_main:194
18:27:42 INFO - /builds/slave/test/build/tests/xpcshell/head.js:_execute_test:405
18:27:42 INFO - typein:null:0
18:27:42 INFO - null:null:0
18:27:42 INFO - child: TEST-INFO | (xpcshell/head.js) | exiting test
18:27:42 INFO - child: TEST-PASS | /builds/slave/test/build/tests/xpcshell/tests/netwerk/test/unit/test_dns_cancel.js | [listener2.onLookupComplete : 48] 0 != 2147500036
18:27:42 INFO - child: TEST-INFO | (xpcshell/head.js) | test finished (4)
18:27:42 INFO - child: CHILD-TEST-COMPLETED
18:27:42 INFO - parent: TEST-INFO | (xpcshell/head.js) | test finished (1)
18:27:42 INFO - parent: TEST-INFO | (xpcshell/head.js) | exiting test
18:27:42 INFO - parent: TEST-INFO | (xpcshell/head.js) | No (+ 0) checks actually run
18:27:42 INFO - <<<<<<<
Assignee: nobody → dd.mozilla

This is probably a race issue, DNS query gets resolved before it gets canceled. We can not do much about it except remove the test if it is happening very often.

Anyway, I will check if something else has caused this failure.
Posted patch fix v1 (obsolete) — Splinter Review
Now using random strings to be resolved instead of fix ones.
Attachment #8504839 - Flags: review?(sworkman)
Comment on attachment 8504839 [details] [diff] [review]
fix v1

Review of attachment 8504839 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/test/unit/test_dns_cancel.js
@@ +1,5 @@
>  var dns = Cc["@mozilla.org/network/dns-service;1"].getService(Ci.nsIDNSService);
>  
> +
> +var hostname1 = "";
> +var hostname2 ="";

missing space
Posted patch fix v2Splinter Review
Thanks Ryan!
Attachment #8504839 - Attachment is obsolete: true
Attachment #8504839 - Flags: review?(sworkman)
Attachment #8504848 - Flags: review?(sworkman)
Hi Dragana,

So, I assume that the random hostnames are to slow down the resolution? It would probably slow them down, but I'm not sure it would help in this situation. It looks like the problem is with the second hostname1 request, no? We want the callback for the first to succeed and the callback for the second to fail. Is it possible that cancel has got rid of the first request and allowed the second? Is this is the race you suspect?

In any case, the API is racy to begin with unfortunately. If the lookup is very quick, then cancel may not have a chance to delete the request. Sigh.

Tell me more about what you're expecting the random hostnames to help with. And if you think that simplifying the test to just calling cancelAsyncRequest and request.cancel for two hostnames would be better, then just do that and I'll r+ it.
(In reply to Steve Workman [:sworkman] from comment #6)
> Hi Dragana,
> 
> So, I assume that the random hostnames are to slow down the resolution? It
> would probably slow them down, but I'm not sure it would help in this
> situation. It looks like the problem is with the second hostname1 request,
> no? We want the callback for the first to succeed and the callback for the
> second to fail. Is it possible that cancel has got rid of the first request
> and allowed the second? Is this is the race you suspect?
> 

The cancel could not get rid of the first request. 
If cancel is called with dns.cancelAsyncRequest(...) one request will be cancel and we can't know which one.
Calling cancel on request it self will cancel exactly that request and exactly that request is removed from the list of the same requests (the same hostname, flags and listener) so the not canceled one must succeed.

From trace:

18:27:42 INFO - child: TEST-PASS | /builds/slave/test/build/tests/xpcshell/tests/netwerk/test/unit/test_dns_cancel.js | [listener1.onLookupComplete : 24] 0 != 2147500036
18:27:42 INFO - child: TEST-INFO | (xpcshell/head.js) | test finished (5)
18:27:42 INFO - child: TEST-UNEXPECTED-FAIL | /builds/slave/test/build/tests/xpcshell/tests/netwerk/test/unit/test_dns_cancel.js | 0 == 2147500036 - See following stack:
18:27:42 INFO - /builds/slave/test/build/tests/xpcshell/tests/netwerk/test/unit/test_dns_cancel.js:listener1.onLookupComplete:19

The first "TEST-PASS" is for the first request for hostname1 which returned 0 (succeeded as expected) and the second one should be canceled but it is not, because the resolver is too fast.

The race that i was talking about is between sending request, actually, the request getting resolved and canceling it. 

> In any case, the API is racy to begin with unfortunately. If the lookup is
> very quick, then cancel may not have a chance to delete the request. Sigh.
> 
> Tell me more about what you're expecting the random hostnames to help with.
> And if you think that simplifying the test to just calling
> cancelAsyncRequest and request.cancel for two hostnames would be better,
> then just do that and I'll r+ it.


The random hostnames take longer to be resolved, therefore, I expect that cancel will be faster than the resolver.
I have made the test in this way to test if the right request gets canceled (there are 2 request for the same host and flags from different listeners and 2 request from the same listener for the same hostname,flags the right one should be canceled)

I can't 100% guaranty that it is not going to fail again. If it starts failing, I will remove everything except two cancel calls as you suggested but that also does not remove the race. So removing the test is the last option.

Thanks Steve
Comment on attachment 8504848 [details] [diff] [review]
fix v2

Review of attachment 8504848 [details] [diff] [review]:
-----------------------------------------------------------------

Sounds good! Thanks for the explanation! r=me.
Attachment #8504848 - Flags: review?(sworkman) → review+
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/4f9432db3e77
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla36
You need to log in before you can comment on or make changes to this bug.