Closed Bug 1242755 Opened 9 years ago Closed 9 years ago

Investigate socketTransport hangs during shutdown

Categories

(Core :: Networking, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla47
Tracking Status
firefox46 --- fixed
firefox47 --- fixed

People

(Reporter: dragana, Assigned: dragana)

References

Details

Attachments

(1 file)

Hangs: https://crash-stats.mozilla.com/report/index/eb93c0ba-d859-4e54-8b14-240f72160124#allthreads https://crash-stats.mozilla.com/report/index/1e1ea50f-c28e-484a-95b1-9a40f2160124#allthreads https://crash-stats.mozilla.com/report/index/3af324df-4f9f-4ec5-8397-c8ca92160123#allthreads It would be good to know why is it hanging here. Maybe a quick solution would be to change wait time in PR_Poll to something like 2min, so that will wake up socket Thread not too often.
Bug 1241906 could be of interest here as well.
Short explanation what I was planning here: This crash happens approximately once per day on Nightly, so we can use Nightly look for an error. I was planning to add a lot of checks, e.g. did go off/online happen short before that, lost connectivity, make somehow flag that we did get event but PR_Poll have not wake up etc. (I am still thinking about this). This I will push only to nightly and take it out after couple of days. I want to do bit of research before the patch first.
Assignee: nobody → dd.mozilla
Status: NEW → ASSIGNED
Summary: Investigate PR_Poll hangs during shutdown → Investigate socketTransport hangs during shutdown
I have change the title. There are some hangs that are now more often I would say and most of them are connected to tls and http2 and pipes... A short summary: 1) http2 and in/outPipe (there was a strange bug (bug 1153929) for which we have added the check at http://hg.mozilla.org/mozilla-central/annotate/3f41d7d0f544/netwerk/protocol/http/nsHttpTransaction.cpp#l806 the hangs is couple of line down (line 816)): https://crash-stats.mozilla.com/report/index/4295bd65-0a7a-401d-867b-ebe442160125#allthreads https://crash-stats.mozilla.com/report/index/77f94d70-81b6-48bf-8d9b-879282160125#allthreads https://crash-stats.mozilla.com/report/index/0065cec6-b08d-4d58-9ab9-f64d82160124#allthreads https://crash-stats.mozilla.com/report/index/8544bb17-32d9-4332-8ea5-5c0c92160130#allthreads https://crash-stats.mozilla.com/report/index/bab91611-9442-47ad-8177-75a0b2160130#allthreads https://crash-stats.mozilla.com/report/index/436b1a8d-bc23-4bd0-9d1b-6309f2160125#allthreads https://crash-stats.mozilla.com/report/index/6024a5f8-3e7d-46f8-b2f7-953e12160126#allthreads (maybe) https://crash-stats.mozilla.com/report/index/5353c8ca-fb68-4cca-b58d-aa9f92160128#allthreads (maybe) https://crash-stats.mozilla.com/report/index/4e1df800-7f9d-4998-a915-bce412160128#allthreads (maybe) https://crash-stats.mozilla.com/report/index/617ff774-5ff7-49e6-a24e-12ff32160130#allthreads (probably) https://crash-stats.mozilla.com/report/index/c6660647-8451-45cc-ae73-1dfd42160130#allthreads (probably) https://crash-stats.mozilla.com/report/index/4b1f5adf-49a7-430f-ba56-7b7bd2160125#allthreads 1) Http2 https://crash-stats.mozilla.com/report/index/09302986-75ac-405a-86ad-981412160130#allthreads https://crash-stats.mozilla.com/report/index/a5e7c4db-1feb-4e04-9848-a11192160130#allthreads https://crash-stats.mozilla.com/report/index/693ce3d3-ada3-49af-a522-a31102160129#allthreads 2) tls h2/spdy: https://crash-stats.mozilla.com/report/index/b661d3cc-9956-4e5c-83d7-0b7a92160128#allthreads https://crash-stats.mozilla.com/report/index/40a304c2-e498-4752-8526-fae822160130#allthreads https://crash-stats.mozilla.com/report/index/be38ed80-41c5-4a6c-8185-548222160130#allthreads https://crash-stats.mozilla.com/report/index/0b2e9976-a920-444f-a625-25be52160130#allthreads https://crash-stats.mozilla.com/report/index/2346ab0a-023d-4239-b04c-0e6952160130#allthreads 3) nss: https://crash-stats.mozilla.com/report/index/447be452-e2a8-4d44-a248-4d67e2160129#allthreads https://crash-stats.mozilla.com/report/index/98fbcd45-522d-4adb-8c71-fae9b2160125#allthreads Some others: https://crash-stats.mozilla.com/report/index/76a30f3f-eeca-44a8-9c54-65a3f2160129#allthreads https://crash-stats.mozilla.com/report/index/163686ab-a0ad-4913-b778-51d002160126#allthreads https://crash-stats.mozilla.com/report/index/9534195b-6ff6-43a7-93d5-0e25c2160125#allthreads They are not high volume crashes but still... The only bigger change that I have made is rearranging of call to nsHttpConnectionMgr shutdown to be call a bit later, but still before socketTransport shutdown. Some of this crashes were happening before this change as well.
Blocks: 1238910
I will make a patch that reverse the change made in bug 1238910 and add a bit of a hack to inform nsIOService that we are in shutdown earlier.
Attachment #8715769 - Flags: review?(mcmanus)
Attachment #8715769 - Flags: review?(mcmanus) → review+
Keywords: checkin-needed
Bug 1238910 landed on 46 so i am marking it as affected. (I will ask for an uplift next week)
Backed this out in https://hg.mozilla.org/integration/mozilla-inbound/rev/f80bc181d451 because of xpcshell failures on Linux x64 in a push test related to a failing promise on shutdown. Backout jobs: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=f80bc181d451 Failing jobs: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=4f4f57c87427 Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=21255857&repo=mozilla-inbound 02:55:46 INFO - TEST-START | dom/push/test/xpcshell/test_notification_http2.js 02:55:48 WARNING - TEST-UNEXPECTED-FAIL | dom/push/test/xpcshell/test_notification_http2.js | xpcshell return code: 0 02:55:48 INFO - TEST-INFO took 2011ms 02:55:48 INFO - >>>>>>> 02:55:48 INFO - (xpcshell/head.js) | test MAIN run_test pending (1) 02:55:48 INFO - TEST-PASS | dom/push/test/xpcshell/test_notification_http2.js | run_test - [run_test : 18] "53743" != null 02:55:48 INFO - PROCESS | 12784 | using port 53743 02:55:48 INFO - PROCESS | 12784 | Certificate Override in place 02:55:48 INFO - "CONSOLE_MESSAGE: (error) [JavaScript Error: "localhost:53743 uses an invalid security certificate. 02:55:48 INFO - The certificate is not trusted because the issuer certificate is unknown. 02:55:48 INFO - The server might not be sending the appropriate intermediate certificates. 02:55:48 INFO - An additional root certificate may need to be imported. 02:55:48 INFO - The certificate is only valid for foo.example.com 02:55:48 INFO - Error code: <a id="errorCode" title="SEC_ERROR_UNKNOWN_ISSUER">SEC_ERROR_UNKNOWN_ISSUER</a> 02:55:48 INFO - "]" 02:55:48 INFO - (xpcshell/head.js) | test run_next_test 0 pending (2) 02:55:48 INFO - (xpcshell/head.js) | test MAIN run_test finished (2) 02:55:48 INFO - running event loop 02:55:48 INFO - dom/push/test/xpcshell/test_notification_http2.js | Starting test_pushNotifications 02:55:48 INFO - (xpcshell/head.js) | test test_pushNotifications pending (2) 02:55:48 INFO - (xpcshell/head.js) | test run_next_test 0 finished (2) 02:55:48 INFO - "CONSOLE_MESSAGE: (warn) [JavaScript Warning: "ReferenceError: reference to undefined property Cc['@mozilla.org/network/manager;1']" {file: "resource://gre/modules/PushService.jsm" line: 324}]" 02:55:48 INFO - TEST-PASS | dom/push/test/xpcshell/test_notification_http2.js | test_pushNotifications - [test_pushNotifications : 127] decoded message is incorrect - "Some message" == "Some message" 02:55:48 INFO - TEST-PASS | dom/push/test/xpcshell/test_notification_http2.js | test_pushNotifications - [test_pushNotifications : 134] decoded message is incorrect - "Some message" == "Some message" 02:55:48 INFO - TEST-PASS | dom/push/test/xpcshell/test_notification_http2.js | test_pushNotifications - [test_pushNotifications : 141] decoded message is incorrect - "Some message" == "Some message" 02:55:48 INFO - (xpcshell/head.js) | test run_next_test 1 pending (2) 02:55:48 INFO - (xpcshell/head.js) | test test_pushNotifications finished (2) 02:55:48 INFO - dom/push/test/xpcshell/test_notification_http2.js | Starting test_complete 02:55:48 INFO - (xpcshell/head.js) | test test_complete pending (2) 02:55:48 INFO - (xpcshell/head.js) | test run_next_test 1 finished (2) 02:55:48 INFO - (xpcshell/head.js) | test run_next_test 2 pending (2) 02:55:48 INFO - (xpcshell/head.js) | test test_complete finished (2) 02:55:48 INFO - (xpcshell/head.js) | test run_next_test 2 finished (1) 02:55:48 INFO - exiting test 02:55:48 INFO - "CONSOLE_MESSAGE: (warn) [JavaScript Warning: "ReferenceError: reference to undefined property txn.result" {file: "resource://gre/modules/IndexedDBHelper.jsm" line: 153}]" 02:55:48 WARNING - TEST-UNEXPECTED-FAIL | dom/push/test/xpcshell/test_notification_http2.js | test_complete - [test_complete : 221] A promise chain failed to handle a rejection: Error: Error removing subscription: 2147500036 - rejection date: Mon Feb 08 2016 02:55:47 GMT-0800 (PST) - stack: PushServiceDelete.prototype.onStopRequest@resource://gre/modules/PushServiceHttp2.jsm:207:21 02:55:48 INFO - observe@/builds/slave/test/build/tests/xpcshell/tests/dom/push/test/xpcshell/head.js:41:7 02:55:48 INFO - _execute_test@/builds/slave/test/build/tests/xpcshell/head.js:614:5 02:55:48 INFO - @-e:1:1 02:55:48 INFO - - false == true 02:55:48 INFO - resource://testing-common/PromiseTestUtils.jsm:assertNoUncaughtRejections:221 02:55:48 INFO - /builds/slave/test/build/tests/xpcshell/head.js:_execute_test:623 02:55:48 INFO - -e:null:1 02:55:48 INFO - exiting test 02:55:48 INFO - <<<<<<<
Flags: needinfo?(dd.mozilla)
Depends on: 1246632
Thanks. This is a bug in PushServiceHttp2.js I file a bug and made a patch.
Flags: needinfo?(dd.mozilla)
Keywords: checkin-needed
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla47
Comment on attachment 8715769 [details] [diff] [review] bug_1242755_v1.patch Approval Request Comment [Feature/regressing bug #]: Bug 1238910 [User impact if declined]: Bug 1238910 changed how nsHttpConnectionMgr is shut down, but this cause some shutdown hangs mostly in nss. e.g https://crash-stats.mozilla.com/report/index/b6075322-659b-4d92-b28b-a30202160208#allthreads [Describe test coverage new/current, TreeHerder]: This code path is executed during any shutdown. [Risks and why]: Low risk it reverse the behavior as it was before bug 1238910 landed [String/UUID change made/needed]:none
Attachment #8715769 - Flags: approval-mozilla-aurora?
Comment on attachment 8715769 [details] [diff] [review] bug_1242755_v1.patch Backs out bug 1238910, to prevent shutdown hangs. Please uplift to aurora
Attachment #8715769 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Blocks: 1158189
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: