Closed Bug 1242755 Opened 4 years ago Closed 4 years ago

Investigate socketTransport hangs during shutdown

Categories

(Core :: Networking, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla47
Tracking Status
firefox46 --- fixed
firefox47 --- fixed

People

(Reporter: dragana, Assigned: dragana)

References

Details

Attachments

(1 file)

Hangs:
https://crash-stats.mozilla.com/report/index/eb93c0ba-d859-4e54-8b14-240f72160124#allthreads
https://crash-stats.mozilla.com/report/index/1e1ea50f-c28e-484a-95b1-9a40f2160124#allthreads
https://crash-stats.mozilla.com/report/index/3af324df-4f9f-4ec5-8397-c8ca92160123#allthreads

It would be good to know why is it hanging here.

Maybe a quick solution would be to change wait time in PR_Poll to something like 2min, so that will wake up socket Thread not too often.
Bug 1241906 could be of interest here as well.
Short explanation what I was planning here:

This crash happens approximately once per day on Nightly, so we can use Nightly look for an error.

I was planning to add a lot of checks, e.g. did go off/online happen short before that, lost connectivity, make somehow flag that we did get event but PR_Poll have not wake up etc. (I am still thinking about this). 
This I will push only to nightly and take it out after couple of days.

I want to do bit of research before the patch first.
Assignee: nobody → dd.mozilla
Status: NEW → ASSIGNED
Summary: Investigate PR_Poll hangs during shutdown → Investigate socketTransport hangs during shutdown
I have change the title. There are some hangs that are now more often I would say and most of them are connected to tls and http2 and pipes...

A short summary:
1) http2 and in/outPipe (there was a strange bug (bug 1153929) for which we have added the check at http://hg.mozilla.org/mozilla-central/annotate/3f41d7d0f544/netwerk/protocol/http/nsHttpTransaction.cpp#l806 the hangs is couple of line down (line 816)):

https://crash-stats.mozilla.com/report/index/4295bd65-0a7a-401d-867b-ebe442160125#allthreads
https://crash-stats.mozilla.com/report/index/77f94d70-81b6-48bf-8d9b-879282160125#allthreads
https://crash-stats.mozilla.com/report/index/0065cec6-b08d-4d58-9ab9-f64d82160124#allthreads
https://crash-stats.mozilla.com/report/index/8544bb17-32d9-4332-8ea5-5c0c92160130#allthreads
https://crash-stats.mozilla.com/report/index/bab91611-9442-47ad-8177-75a0b2160130#allthreads
https://crash-stats.mozilla.com/report/index/436b1a8d-bc23-4bd0-9d1b-6309f2160125#allthreads
https://crash-stats.mozilla.com/report/index/6024a5f8-3e7d-46f8-b2f7-953e12160126#allthreads
(maybe) https://crash-stats.mozilla.com/report/index/5353c8ca-fb68-4cca-b58d-aa9f92160128#allthreads
(maybe) https://crash-stats.mozilla.com/report/index/4e1df800-7f9d-4998-a915-bce412160128#allthreads
(maybe) https://crash-stats.mozilla.com/report/index/617ff774-5ff7-49e6-a24e-12ff32160130#allthreads
(probably) https://crash-stats.mozilla.com/report/index/c6660647-8451-45cc-ae73-1dfd42160130#allthreads
(probably) https://crash-stats.mozilla.com/report/index/4b1f5adf-49a7-430f-ba56-7b7bd2160125#allthreads

1) Http2
https://crash-stats.mozilla.com/report/index/09302986-75ac-405a-86ad-981412160130#allthreads
https://crash-stats.mozilla.com/report/index/a5e7c4db-1feb-4e04-9848-a11192160130#allthreads
https://crash-stats.mozilla.com/report/index/693ce3d3-ada3-49af-a522-a31102160129#allthreads

2) tls h2/spdy:
https://crash-stats.mozilla.com/report/index/b661d3cc-9956-4e5c-83d7-0b7a92160128#allthreads
https://crash-stats.mozilla.com/report/index/40a304c2-e498-4752-8526-fae822160130#allthreads
https://crash-stats.mozilla.com/report/index/be38ed80-41c5-4a6c-8185-548222160130#allthreads
https://crash-stats.mozilla.com/report/index/0b2e9976-a920-444f-a625-25be52160130#allthreads
https://crash-stats.mozilla.com/report/index/2346ab0a-023d-4239-b04c-0e6952160130#allthreads


3) nss:
https://crash-stats.mozilla.com/report/index/447be452-e2a8-4d44-a248-4d67e2160129#allthreads
https://crash-stats.mozilla.com/report/index/98fbcd45-522d-4adb-8c71-fae9b2160125#allthreads


Some others:
https://crash-stats.mozilla.com/report/index/76a30f3f-eeca-44a8-9c54-65a3f2160129#allthreads
https://crash-stats.mozilla.com/report/index/163686ab-a0ad-4913-b778-51d002160126#allthreads
https://crash-stats.mozilla.com/report/index/9534195b-6ff6-43a7-93d5-0e25c2160125#allthreads


They are not high volume crashes but still...

The only bigger change that I have made is rearranging of call to nsHttpConnectionMgr shutdown to be call a bit later, but still before socketTransport shutdown. Some of this crashes were happening before this change as well.
Blocks: 1238910
I will make a patch that reverse the change made in bug 1238910 and add a bit of a hack to inform nsIOService that we are in shutdown earlier.
Attachment #8715769 - Flags: review?(mcmanus)
Attachment #8715769 - Flags: review?(mcmanus) → review+
Keywords: checkin-needed
Bug 1238910 landed on 46 so i am marking it as affected. (I will ask for an uplift next week)
Backed this out in https://hg.mozilla.org/integration/mozilla-inbound/rev/f80bc181d451 because of xpcshell failures on Linux x64 in a push test related to a failing promise on shutdown.

Backout jobs: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=f80bc181d451

Failing jobs: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=4f4f57c87427
Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=21255857&repo=mozilla-inbound

02:55:46     INFO -  TEST-START | dom/push/test/xpcshell/test_notification_http2.js
02:55:48  WARNING -  TEST-UNEXPECTED-FAIL | dom/push/test/xpcshell/test_notification_http2.js | xpcshell return code: 0
02:55:48     INFO -  TEST-INFO took 2011ms
02:55:48     INFO -  >>>>>>>
02:55:48     INFO -  (xpcshell/head.js) | test MAIN run_test pending (1)
02:55:48     INFO -  TEST-PASS | dom/push/test/xpcshell/test_notification_http2.js | run_test - [run_test : 18] "53743" != null
02:55:48     INFO -  PROCESS | 12784 | using port 53743
02:55:48     INFO -  PROCESS | 12784 | Certificate Override in place
02:55:48     INFO -  "CONSOLE_MESSAGE: (error) [JavaScript Error: "localhost:53743 uses an invalid security certificate.
02:55:48     INFO -  The certificate is not trusted because the issuer certificate is unknown.
02:55:48     INFO -  The server might not be sending the appropriate intermediate certificates.
02:55:48     INFO -  An additional root certificate may need to be imported.
02:55:48     INFO -  The certificate is only valid for foo.example.com
02:55:48     INFO -  Error code: <a id="errorCode" title="SEC_ERROR_UNKNOWN_ISSUER">SEC_ERROR_UNKNOWN_ISSUER</a>
02:55:48     INFO -  "]"
02:55:48     INFO -  (xpcshell/head.js) | test run_next_test 0 pending (2)
02:55:48     INFO -  (xpcshell/head.js) | test MAIN run_test finished (2)
02:55:48     INFO -  running event loop
02:55:48     INFO -  dom/push/test/xpcshell/test_notification_http2.js | Starting test_pushNotifications
02:55:48     INFO -  (xpcshell/head.js) | test test_pushNotifications pending (2)
02:55:48     INFO -  (xpcshell/head.js) | test run_next_test 0 finished (2)
02:55:48     INFO -  "CONSOLE_MESSAGE: (warn) [JavaScript Warning: "ReferenceError: reference to undefined property Cc['@mozilla.org/network/manager;1']" {file: "resource://gre/modules/PushService.jsm" line: 324}]"
02:55:48     INFO -  TEST-PASS | dom/push/test/xpcshell/test_notification_http2.js | test_pushNotifications - [test_pushNotifications : 127] decoded message is incorrect - "Some message" == "Some message"
02:55:48     INFO -  TEST-PASS | dom/push/test/xpcshell/test_notification_http2.js | test_pushNotifications - [test_pushNotifications : 134] decoded message is incorrect - "Some message" == "Some message"
02:55:48     INFO -  TEST-PASS | dom/push/test/xpcshell/test_notification_http2.js | test_pushNotifications - [test_pushNotifications : 141] decoded message is incorrect - "Some message" == "Some message"
02:55:48     INFO -  (xpcshell/head.js) | test run_next_test 1 pending (2)
02:55:48     INFO -  (xpcshell/head.js) | test test_pushNotifications finished (2)
02:55:48     INFO -  dom/push/test/xpcshell/test_notification_http2.js | Starting test_complete
02:55:48     INFO -  (xpcshell/head.js) | test test_complete pending (2)
02:55:48     INFO -  (xpcshell/head.js) | test run_next_test 1 finished (2)
02:55:48     INFO -  (xpcshell/head.js) | test run_next_test 2 pending (2)
02:55:48     INFO -  (xpcshell/head.js) | test test_complete finished (2)
02:55:48     INFO -  (xpcshell/head.js) | test run_next_test 2 finished (1)
02:55:48     INFO -  exiting test
02:55:48     INFO -  "CONSOLE_MESSAGE: (warn) [JavaScript Warning: "ReferenceError: reference to undefined property txn.result" {file: "resource://gre/modules/IndexedDBHelper.jsm" line: 153}]"
02:55:48  WARNING -  TEST-UNEXPECTED-FAIL | dom/push/test/xpcshell/test_notification_http2.js | test_complete - [test_complete : 221] A promise chain failed to handle a rejection: Error: Error removing subscription: 2147500036 - rejection date: Mon Feb 08 2016 02:55:47 GMT-0800 (PST) - stack: PushServiceDelete.prototype.onStopRequest@resource://gre/modules/PushServiceHttp2.jsm:207:21
02:55:48     INFO -  observe@/builds/slave/test/build/tests/xpcshell/tests/dom/push/test/xpcshell/head.js:41:7
02:55:48     INFO -  _execute_test@/builds/slave/test/build/tests/xpcshell/head.js:614:5
02:55:48     INFO -  @-e:1:1
02:55:48     INFO -   - false == true
02:55:48     INFO -  resource://testing-common/PromiseTestUtils.jsm:assertNoUncaughtRejections:221
02:55:48     INFO -  /builds/slave/test/build/tests/xpcshell/head.js:_execute_test:623
02:55:48     INFO -  -e:null:1
02:55:48     INFO -  exiting test
02:55:48     INFO -  <<<<<<<
Flags: needinfo?(dd.mozilla)
Depends on: 1246632
Thanks.

This is a bug in PushServiceHttp2.js I file a bug and made a patch.
Flags: needinfo?(dd.mozilla)
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/28069d20b125
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla47
Comment on attachment 8715769 [details] [diff] [review]
bug_1242755_v1.patch

Approval Request Comment
[Feature/regressing bug #]: Bug 1238910
[User impact if declined]: Bug 1238910 changed how nsHttpConnectionMgr is shut down, but this cause some shutdown hangs mostly in nss. e.g https://crash-stats.mozilla.com/report/index/b6075322-659b-4d92-b28b-a30202160208#allthreads 
[Describe test coverage new/current, TreeHerder]: This code path is executed during any shutdown. 
[Risks and why]: Low risk it reverse the behavior as it was before bug 1238910 landed
[String/UUID change made/needed]:none
Attachment #8715769 - Flags: approval-mozilla-aurora?
Comment on attachment 8715769 [details] [diff] [review]
bug_1242755_v1.patch

Backs out bug 1238910, to prevent shutdown hangs.
Please uplift to aurora
Attachment #8715769 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Blocks: 1158189
You need to log in before you can comment on or make changes to this bug.