Closed Bug 1640724 Opened 5 years ago Closed 5 years ago

Intermittent netwerk/test/unit/test_trr_case_sensitivity.js | test_trr_casing - [test_trr_casing : 1] Checking status - 2152398878 == 0

Categories

(Core :: Networking: DNS, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla79
Tracking Status
firefox78 --- fixed
firefox79 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: valentin)

References

(Regression)

Details

(Keywords: intermittent-failure, regression, Whiteboard: [necko-triaged][retriggered][stockwell disable-recommended] )

Attachments

(2 files, 1 obsolete file)

Filed by: nerli [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=303668276&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/MMzeaq5KT8iZhntjSZXT-w/runs/0/artifacts/public/logs/live_backing.log


[task 2020-05-25T18:20:05.757Z] 18:20:05 INFO - TEST-START | netwerk/test/unit/test_trr_case_sensitivity.js
[task 2020-05-25T18:20:38.683Z] 18:20:38 WARNING - TEST-UNEXPECTED-FAIL | netwerk/test/unit/test_trr_case_sensitivity.js | xpcshell return code: 0
[task 2020-05-25T18:20:38.683Z] 18:20:38 INFO - TEST-INFO took 32922ms
[task 2020-05-25T18:20:38.683Z] 18:20:38 INFO - >>>>>>>
[task 2020-05-25T18:20:38.683Z] 18:20:38 INFO - PID 5549 | [5549, Main Thread] WARNING: Couldn't get the user appdata directory. Crash events may not be produced.: file /builds/worker/checkouts/gecko/toolkit/crashreporter/nsExceptionHandler.cpp, line 2924
[task 2020-05-25T18:20:38.683Z] 18:20:38 INFO - PID 5549 | start!
[task 2020-05-25T18:20:38.683Z] 18:20:38 INFO - (xpcshell/head.js) | test MAIN run_test pending (1)
[task 2020-05-25T18:20:38.684Z] 18:20:38 INFO - (xpcshell/head.js) | test run_next_test 0 pending (2)
[task 2020-05-25T18:20:38.684Z] 18:20:38 INFO - (xpcshell/head.js) | test MAIN run_test finished (2)
[task 2020-05-25T18:20:38.684Z] 18:20:38 INFO - running event loop
[task 2020-05-25T18:20:38.684Z] 18:20:38 INFO - PID 5549 | Couldn't convert chrome URL: chrome://branding/locale/brand.properties
[task 2020-05-25T18:20:38.684Z] 18:20:38 INFO - PID 5549 | [5549, Main Thread] WARNING: Could not get the program name for a cubeb stream.: 'NS_SUCCEEDED(rv)', file /builds/worker/checkouts/gecko/dom/media/CubebUtils.cpp, line 375
[task 2020-05-25T18:20:38.684Z] 18:20:38 INFO - netwerk/test/unit/test_trr_case_sensitivity.js | Starting test_trr_casing
[task 2020-05-25T18:20:38.684Z] 18:20:38 INFO - (xpcshell/head.js) | test test_trr_casing pending (2)
[task 2020-05-25T18:20:38.684Z] 18:20:38 INFO - PID 5549 | [5549, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, kKnownEsrVersion) failed with result 0x80004002 (NS_NOINTERFACE): file /builds/worker/checkouts/gecko/toolkit/components/resistfingerprinting/nsRFPService.cpp, line 673
[task 2020-05-25T18:20:38.684Z] 18:20:38 INFO - (xpcshell/head.js) | test run_next_test 0 finished (2)
[task 2020-05-25T18:20:38.684Z] 18:20:38 INFO - "CONSOLE_MESSAGE: (info) No chrome package registered for chrome://branding/locale/brand.properties"
[task 2020-05-25T18:20:38.684Z] 18:20:38 INFO - PID 5549 | port = 57921
[task 2020-05-25T18:20:38.685Z] 18:20:38 INFO - TEST-PASS | netwerk/test/unit/test_trr_case_sensitivity.js | test_trr_casing - [test_trr_casing : 1] 44 == 44
[task 2020-05-25T18:20:38.685Z] 18:20:38 INFO - TEST-PASS | netwerk/test/unit/test_trr_case_sensitivity.js | test_trr_casing - [test_trr_casing : 42] "<h1> 404 Path not found: /test?bla=some</h1>" == "<h1> 404 Path not found: /test?bla=some</h1>"
[task 2020-05-25T18:20:38.685Z] 18:20:38 INFO - TEST-PASS | netwerk/test/unit/test_trr_case_sensitivity.js | test_trr_casing - [test_trr_casing : 1] Checking that this is the correct callback - true == true
[task 2020-05-25T18:20:38.685Z] 18:20:38 INFO - TEST-PASS | netwerk/test/unit/test_trr_case_sensitivity.js | test_trr_casing - [test_trr_casing : 1] Checking status - 0 == 0
[task 2020-05-25T18:20:38.685Z] 18:20:38 INFO - TEST-PASS | netwerk/test/unit/test_trr_case_sensitivity.js | test_trr_casing - [test_trr_casing : 1] Checking result for a.example.com - "1.2.3.4" == "1.2.3.4"
[task 2020-05-25T18:20:38.685Z] 18:20:38 INFO - TEST-PASS | netwerk/test/unit/test_trr_case_sensitivity.js | test_trr_casing - [test_trr_casing : 1] Checking that this is the correct callback - true == true
[task 2020-05-25T18:20:38.685Z] 18:20:38 WARNING - TEST-UNEXPECTED-FAIL | netwerk/test/unit/test_trr_case_sensitivity.js | test_trr_casing - [test_trr_casing : 1] Checking status - 2152398878 == 0
[task 2020-05-25T18:20:38.685Z] 18:20:38 INFO - -e:null:1
[task 2020-05-25T18:20:38.685Z] 18:20:38 INFO - exiting test
[task 2020-05-25T18:20:38.685Z] 18:20:38 INFO - PID 5549 | JavaScript error: /Users/cltbld/tasks/task_1590428687/build/tests/xpcshell/head.js, line 833: NS_ERROR_ABORT:
[task 2020-05-25T18:20:38.685Z] 18:20:38 INFO - "CONSOLE_MESSAGE: (error) [JavaScript Error: "NS_ERROR_ABORT: " {file: "/Users/cltbld/tasks/task_1590428687/build/tests/xpcshell/head.js" line: 833}]
[task 2020-05-25T18:20:38.686Z] 18:20:38 INFO - _abort_failed_test@/Users/cltbld/tasks/task_1590428687/build/tests/xpcshell/head.js:833:20
[task 2020-05-25T18:20:38.686Z] 18:20:38 INFO - do_report_result@/Users/cltbld/tasks/task_1590428687/build/tests/xpcshell/head.js:934:5
[task 2020-05-25T18:20:38.686Z] 18:20:38 INFO - Assert<@/Users/cltbld/tasks/task_1590428687/build/tests/xpcshell/head.js:73:21
[task 2020-05-25T18:20:38.686Z] 18:20:38 INFO - proto.report@resource://testing-common/Assert.jsm:233:10
[task 2020-05-25T18:20:38.686Z] 18:20:38 INFO - equal@resource://testing-common/Assert.jsm:275:8
[task 2020-05-25T18:20:38.686Z] 18:20:38 INFO - onLookupComplete@/Users/cltbld/tasks/task_1590428687/build/tests/xpcshell/tests/netwerk/test/unit/head_trr.js:144:12
[task 2020-05-25T18:20:38.686Z] 18:20:38 INFO - _do_main@/Users/cltbld/tasks/task_1590428687/build/tests/xpcshell/head.js:248:6
[task 2020-05-25T18:20:38.686Z] 18:20:38 INFO - _execute_test@/Users/cltbld/tasks/task_1590428687/build/tests/xpcshell/head.js:577:5
[task 2020-05-25T18:20:38.687Z] 18:20:38 INFO - @-e:1:1
[task 2020-05-25T18:20:38.687Z] 18:20:38 INFO - "
[task 2020-05-25T18:20:38.687Z] 18:20:38 INFO - PID 5549 | [5549, Main Thread] WARNING: Extra shutdown CC: 'i < NORMAL_SHUTDOWN_COLLECTIONS', file /builds/worker/checkouts/gecko/xpcom/base/nsCycleCollector.cpp, line 3359
[task 2020-05-25T18:20:38.687Z] 18:20:38 INFO - PID 5549 | [5549, Main Thread] WARNING: OOPDeinit() without successful OOPInit(): file /builds/worker/checkouts/gecko/toolkit/crashreporter/nsExceptionHandler.cpp, line 3370
[task 2020-05-25T18:20:38.687Z] 18:20:38 INFO - PID 5549 | [5549, Main Thread] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /builds/worker/checkouts/gecko/xpcom/base/nsTraceRefcnt.cpp, line 202
[task 2020-05-25T18:20:38.688Z] 18:20:38 INFO - PID 5549 | [5549, Main Thread] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /builds/worker/checkouts/gecko/xpcom/base/nsTraceRefcnt.cpp, line 202
[task 2020-05-25T18:20:38.688Z] 18:20:38 INFO - <<<<<<<

Seems like the test is racy.

Assignee: nobody → valentin.gosu
Severity: normal → S3
Priority: P5 → P2
Whiteboard: [necko-triaged]
Flags: needinfo?(valentin.gosu)
Regressed by: 1635566
Whiteboard: [necko-triaged] → [necko-triaged][retriggered]
Has Regression Range: --- → yes
Keywords: regression

There are 75 failures in the last 7 days on linux1804-64 opt,debug; inux1804-64-shippable opt; macosx1014-64 debug
Recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=305274742&repo=mozilla-central&lineNumber=5119

''' [task 2020-06-05T22:46:43.760Z] 22:46:43 INFO - TEST-START | netwerk/test/unit/test_trr_case_sensitivity.js
[task 2020-06-05T22:47:14.084Z] 22:47:14 WARNING - TEST-UNEXPECTED-FAIL | netwerk/test/unit/test_trr_case_sensitivity.js | xpcshell return code: 0
[task 2020-06-05T22:47:14.084Z] 22:47:14 INFO - TEST-INFO took 30322ms
[task 2020-06-05T22:47:14.084Z] 22:47:14 INFO - >>>>>>>
[task 2020-06-05T22:47:14.084Z] 22:47:14 INFO - PID 19571 | start!
[task 2020-06-05T22:47:14.084Z] 22:47:14 INFO - (xpcshell/head.js) | test MAIN run_test pending (1)
[task 2020-06-05T22:47:14.084Z] 22:47:14 INFO - (xpcshell/head.js) | test run_next_test 0 pending (2)
[task 2020-06-05T22:47:14.084Z] 22:47:14 INFO - (xpcshell/head.js) | test MAIN run_test finished (2)
[task 2020-06-05T22:47:14.084Z] 22:47:14 INFO - running event loop
[task 2020-06-05T22:47:14.084Z] 22:47:14 INFO - "CONSOLE_MESSAGE: (info) No chrome package registered for chrome://branding/locale/brand.properties"
[task 2020-06-05T22:47:14.084Z] 22:47:14 INFO - netwerk/test/unit/test_trr_case_sensitivity.js | Starting test_trr_casing
[task 2020-06-05T22:47:14.084Z] 22:47:14 INFO - (xpcshell/head.js) | test test_trr_casing pending (2)
[task 2020-06-05T22:47:14.084Z] 22:47:14 INFO - (xpcshell/head.js) | test run_next_test 0 finished (2)
[task 2020-06-05T22:47:14.084Z] 22:47:14 INFO - PID 19571 | port = 34504
[task 2020-06-05T22:47:14.084Z] 22:47:14 INFO - TEST-PASS | netwerk/test/unit/test_trr_case_sensitivity.js | test_trr_casing - [test_trr_casing : 1] 44 == 44
[task 2020-06-05T22:47:14.084Z] 22:47:14 INFO - TEST-PASS | netwerk/test/unit/test_trr_case_sensitivity.js | test_trr_casing - [test_trr_casing : 42] "<h1> 404 Path not found: /test?bla=some</h1>" == "<h1> 404 Path not found: /test?bla=some</h1>"
[task 2020-06-05T22:47:14.084Z] 22:47:14 INFO - TEST-PASS | netwerk/test/unit/test_trr_case_sensitivity.js | test_trr_casing - [test_trr_casing : 1] Checking that this is the correct callback - true == true
[task 2020-06-05T22:47:14.084Z] 22:47:14 INFO - TEST-PASS | netwerk/test/unit/test_trr_case_sensitivity.js | test_trr_casing - [test_trr_casing : 1] Checking status - 0 == 0
[task 2020-06-05T22:47:14.084Z] 22:47:14 INFO - TEST-PASS | netwerk/test/unit/test_trr_case_sensitivity.js | test_trr_casing - [test_trr_casing : 1] Checking result for a.example.com - "1.2.3.4" == "1.2.3.4"
[task 2020-06-05T22:47:14.084Z] 22:47:14 INFO - TEST-PASS | netwerk/test/unit/test_trr_case_sensitivity.js | test_trr_casing - [test_trr_casing : 1] Checking that this is the correct callback - true == true
[task 2020-06-05T22:47:14.084Z] 22:47:14 WARNING - TEST-UNEXPECTED-FAIL | netwerk/test/unit/test_trr_case_sensitivity.js | test_trr_casing - [test_trr_casing : 1] Checking status - 2152398878 == 0'''

Flags: needinfo?(valentin.gosu)
Flags: needinfo?(valentin.gosu)

I'm actively working on the bug. I should be able to figure it out in the next day or so.

Flags: needinfo?(valentin.gosu)

This patch is a workaround for an issue that causes an intermittent failure
in test_trr_case_sensitivity.js.
I was able to reproduce the bug locally with some logging, but not with
nsHttp:5 or under rr, which made it difficult to diagnose the root cause.
What I was able to determine using logging and timestamps in the nodejs code
is that we would not get the "end" event for a request, to which we were
reacting to send back the DoH response. The request has a timer, and
eventually that timer would fire and cancel the request. At that point
we would see the "end" event in nodejs but it's too late to actually process
the response.

It's not clear if this was a bug in Firefox's HTTP2 implementation
(maybe the end isn't signaled properly) or in nodejs.

This fix makes it so we send back the response when the number of bytes
specified in contentLength reaches the server.
We should investigate in a follow-up bug if there's another Necko bug
involved here.

Depends on: 1644206
Attached file repro_decrypt.pcapng (obsolete) —

Comment on attachment 9154997 [details]
repro_decrypt.pcapng

Wanted to attach it in bug 1644206 instead.

Attachment #9154997 - Attachment is obsolete: true
Pushed by valentin.gosu@gmail.com: https://hg.mozilla.org/integration/autoland/rev/af65ea1369ff head_trr.js: Emit DoH response when we got all the bytes instead of "end" event r=dragana,necko-reviewers
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla79

Comment on attachment 9154923 [details]
Bug 1640724 - head_trr.js: Emit DoH response when we got all the bytes instead of "end" event r=dragana

Beta/Release Uplift Approval Request

  • User impact if declined: No user impact. This is a test-only change that works around a bug in the nodejs HTTP2 implementation that causes intermittent failures in xpcshell-tests
  • Is this code covered by automated tests?: Unknown
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Only affects the failure of test_trr_case_sensitivity.js in automation.
  • String changes made/needed:
Attachment #9154923 - Flags: approval-mozilla-beta?

Comment on attachment 9154923 [details]
Bug 1640724 - head_trr.js: Emit DoH response when we got all the bytes instead of "end" event r=dragana

This can land a=test-only.

Attachment #9154923 - Flags: approval-mozilla-beta?
Whiteboard: [necko-triaged][retriggered][stockwell disable-recommended] → [necko-triaged][retriggered][stockwell disable-recommended][checkin-needed-beta]

Actually those tests are now disabled in beta and we don't have bug 1641496 there, so we might be better off leaving things as they are?

Whiteboard: [necko-triaged][retriggered][stockwell disable-recommended][checkin-needed-beta] → [necko-triaged][retriggered][stockwell disable-recommended]

This patch is a workaround for an issue that causes an intermittent failure
in test_trr_case_sensitivity.js.
I was able to reproduce the bug locally with some logging, but not with
nsHttp:5 or under rr, which made it difficult to diagnose the root cause.
What I was able to determine using logging and timestamps in the nodejs code
is that we would not get the "end" event for a request, to which we were
reacting to send back the DoH response. The request has a timer, and
eventually that timer would fire and cancel the request. At that point
we would see the "end" event in nodejs but it's too late to actually process
the response.

It's not clear if this was a bug in Firefox's HTTP2 implementation
(maybe the end isn't signaled properly) or in nodejs.

This fix makes it so we send back the response when the number of bytes
specified in contentLength reaches the server.
We should investigate in a follow-up bug if there's another Necko bug
involved here.

(In reply to Julien Cristau [:jcristau] from comment #18)

Actually those tests are now disabled in beta and we don't have bug 1641496 there, so we might be better off leaving things as they are?

This addresses the failures xpcshell-test and verify failures on all platforms. It should stay disabled for socket-process.

Whiteboard: [necko-triaged][retriggered][stockwell disable-recommended] → [necko-triaged][retriggered][stockwell disable-recommended][checkin-needed-beta]
Whiteboard: [necko-triaged][retriggered][stockwell disable-recommended][checkin-needed-beta] → [necko-triaged][retriggered][stockwell disable-recommended]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: