Closed
Bug 1242755
Opened 9 years ago
Closed 9 years ago
Investigate socketTransport hangs during shutdown
Categories
(Core :: Networking, defect)
Core
Networking
Tracking
()
RESOLVED
FIXED
mozilla47
People
(Reporter: dragana, Assigned: dragana)
References
Details
Attachments
(1 file)
8.46 KB,
patch
|
mcmanus
:
review+
lizzard
:
approval-mozilla-aurora+
|
Details | Diff | Splinter Review |
Hangs:
https://crash-stats.mozilla.com/report/index/eb93c0ba-d859-4e54-8b14-240f72160124#allthreads
https://crash-stats.mozilla.com/report/index/1e1ea50f-c28e-484a-95b1-9a40f2160124#allthreads
https://crash-stats.mozilla.com/report/index/3af324df-4f9f-4ec5-8397-c8ca92160123#allthreads
It would be good to know why is it hanging here.
Maybe a quick solution would be to change wait time in PR_Poll to something like 2min, so that will wake up socket Thread not too often.
Assignee | ||
Comment 1•9 years ago
|
||
Bug 1241906 could be of interest here as well.
Assignee | ||
Comment 2•9 years ago
|
||
Short explanation what I was planning here:
This crash happens approximately once per day on Nightly, so we can use Nightly look for an error.
I was planning to add a lot of checks, e.g. did go off/online happen short before that, lost connectivity, make somehow flag that we did get event but PR_Poll have not wake up etc. (I am still thinking about this).
This I will push only to nightly and take it out after couple of days.
I want to do bit of research before the patch first.
Assignee | ||
Updated•9 years ago
|
Assignee: nobody → dd.mozilla
Status: NEW → ASSIGNED
Assignee | ||
Updated•9 years ago
|
Summary: Investigate PR_Poll hangs during shutdown → Investigate socketTransport hangs during shutdown
Assignee | ||
Comment 3•9 years ago
|
||
I have change the title. There are some hangs that are now more often I would say and most of them are connected to tls and http2 and pipes...
A short summary:
1) http2 and in/outPipe (there was a strange bug (bug 1153929) for which we have added the check at http://hg.mozilla.org/mozilla-central/annotate/3f41d7d0f544/netwerk/protocol/http/nsHttpTransaction.cpp#l806 the hangs is couple of line down (line 816)):
https://crash-stats.mozilla.com/report/index/4295bd65-0a7a-401d-867b-ebe442160125#allthreads
https://crash-stats.mozilla.com/report/index/77f94d70-81b6-48bf-8d9b-879282160125#allthreads
https://crash-stats.mozilla.com/report/index/0065cec6-b08d-4d58-9ab9-f64d82160124#allthreads
https://crash-stats.mozilla.com/report/index/8544bb17-32d9-4332-8ea5-5c0c92160130#allthreads
https://crash-stats.mozilla.com/report/index/bab91611-9442-47ad-8177-75a0b2160130#allthreads
https://crash-stats.mozilla.com/report/index/436b1a8d-bc23-4bd0-9d1b-6309f2160125#allthreads
https://crash-stats.mozilla.com/report/index/6024a5f8-3e7d-46f8-b2f7-953e12160126#allthreads
(maybe) https://crash-stats.mozilla.com/report/index/5353c8ca-fb68-4cca-b58d-aa9f92160128#allthreads
(maybe) https://crash-stats.mozilla.com/report/index/4e1df800-7f9d-4998-a915-bce412160128#allthreads
(maybe) https://crash-stats.mozilla.com/report/index/617ff774-5ff7-49e6-a24e-12ff32160130#allthreads
(probably) https://crash-stats.mozilla.com/report/index/c6660647-8451-45cc-ae73-1dfd42160130#allthreads
(probably) https://crash-stats.mozilla.com/report/index/4b1f5adf-49a7-430f-ba56-7b7bd2160125#allthreads
1) Http2
https://crash-stats.mozilla.com/report/index/09302986-75ac-405a-86ad-981412160130#allthreads
https://crash-stats.mozilla.com/report/index/a5e7c4db-1feb-4e04-9848-a11192160130#allthreads
https://crash-stats.mozilla.com/report/index/693ce3d3-ada3-49af-a522-a31102160129#allthreads
2) tls h2/spdy:
https://crash-stats.mozilla.com/report/index/b661d3cc-9956-4e5c-83d7-0b7a92160128#allthreads
https://crash-stats.mozilla.com/report/index/40a304c2-e498-4752-8526-fae822160130#allthreads
https://crash-stats.mozilla.com/report/index/be38ed80-41c5-4a6c-8185-548222160130#allthreads
https://crash-stats.mozilla.com/report/index/0b2e9976-a920-444f-a625-25be52160130#allthreads
https://crash-stats.mozilla.com/report/index/2346ab0a-023d-4239-b04c-0e6952160130#allthreads
3) nss:
https://crash-stats.mozilla.com/report/index/447be452-e2a8-4d44-a248-4d67e2160129#allthreads
https://crash-stats.mozilla.com/report/index/98fbcd45-522d-4adb-8c71-fae9b2160125#allthreads
Some others:
https://crash-stats.mozilla.com/report/index/76a30f3f-eeca-44a8-9c54-65a3f2160129#allthreads
https://crash-stats.mozilla.com/report/index/163686ab-a0ad-4913-b778-51d002160126#allthreads
https://crash-stats.mozilla.com/report/index/9534195b-6ff6-43a7-93d5-0e25c2160125#allthreads
They are not high volume crashes but still...
The only bigger change that I have made is rearranging of call to nsHttpConnectionMgr shutdown to be call a bit later, but still before socketTransport shutdown. Some of this crashes were happening before this change as well.
Assignee | ||
Comment 4•9 years ago
|
||
1) and 2) can be that we are spinning in do-while loop:
http://hg.mozilla.org/mozilla-central/annotate/211a4c710fb6/netwerk/protocol/http/nsHttpConnection.cpp#l1768
Assignee | ||
Comment 5•9 years ago
|
||
I will make a patch that reverse the change made in bug 1238910 and add a bit of a hack to inform nsIOService that we are in shutdown earlier.
Assignee | ||
Comment 6•9 years ago
|
||
Assignee | ||
Comment 7•9 years ago
|
||
Attachment #8715769 -
Flags: review?(mcmanus)
Updated•9 years ago
|
Attachment #8715769 -
Flags: review?(mcmanus) → review+
Assignee | ||
Updated•9 years ago
|
Keywords: checkin-needed
Assignee | ||
Comment 8•9 years ago
|
||
Bug 1238910 landed on 46 so i am marking it as affected. (I will ask for an uplift next week)
status-firefox46:
--- → affected
Keywords: checkin-needed
![]() |
||
Comment 10•9 years ago
|
||
Backed this out in https://hg.mozilla.org/integration/mozilla-inbound/rev/f80bc181d451 because of xpcshell failures on Linux x64 in a push test related to a failing promise on shutdown.
Backout jobs: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=f80bc181d451
Failing jobs: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=4f4f57c87427
Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=21255857&repo=mozilla-inbound
02:55:46 INFO - TEST-START | dom/push/test/xpcshell/test_notification_http2.js
02:55:48 WARNING - TEST-UNEXPECTED-FAIL | dom/push/test/xpcshell/test_notification_http2.js | xpcshell return code: 0
02:55:48 INFO - TEST-INFO took 2011ms
02:55:48 INFO - >>>>>>>
02:55:48 INFO - (xpcshell/head.js) | test MAIN run_test pending (1)
02:55:48 INFO - TEST-PASS | dom/push/test/xpcshell/test_notification_http2.js | run_test - [run_test : 18] "53743" != null
02:55:48 INFO - PROCESS | 12784 | using port 53743
02:55:48 INFO - PROCESS | 12784 | Certificate Override in place
02:55:48 INFO - "CONSOLE_MESSAGE: (error) [JavaScript Error: "localhost:53743 uses an invalid security certificate.
02:55:48 INFO - The certificate is not trusted because the issuer certificate is unknown.
02:55:48 INFO - The server might not be sending the appropriate intermediate certificates.
02:55:48 INFO - An additional root certificate may need to be imported.
02:55:48 INFO - The certificate is only valid for foo.example.com
02:55:48 INFO - Error code: <a id="errorCode" title="SEC_ERROR_UNKNOWN_ISSUER">SEC_ERROR_UNKNOWN_ISSUER</a>
02:55:48 INFO - "]"
02:55:48 INFO - (xpcshell/head.js) | test run_next_test 0 pending (2)
02:55:48 INFO - (xpcshell/head.js) | test MAIN run_test finished (2)
02:55:48 INFO - running event loop
02:55:48 INFO - dom/push/test/xpcshell/test_notification_http2.js | Starting test_pushNotifications
02:55:48 INFO - (xpcshell/head.js) | test test_pushNotifications pending (2)
02:55:48 INFO - (xpcshell/head.js) | test run_next_test 0 finished (2)
02:55:48 INFO - "CONSOLE_MESSAGE: (warn) [JavaScript Warning: "ReferenceError: reference to undefined property Cc['@mozilla.org/network/manager;1']" {file: "resource://gre/modules/PushService.jsm" line: 324}]"
02:55:48 INFO - TEST-PASS | dom/push/test/xpcshell/test_notification_http2.js | test_pushNotifications - [test_pushNotifications : 127] decoded message is incorrect - "Some message" == "Some message"
02:55:48 INFO - TEST-PASS | dom/push/test/xpcshell/test_notification_http2.js | test_pushNotifications - [test_pushNotifications : 134] decoded message is incorrect - "Some message" == "Some message"
02:55:48 INFO - TEST-PASS | dom/push/test/xpcshell/test_notification_http2.js | test_pushNotifications - [test_pushNotifications : 141] decoded message is incorrect - "Some message" == "Some message"
02:55:48 INFO - (xpcshell/head.js) | test run_next_test 1 pending (2)
02:55:48 INFO - (xpcshell/head.js) | test test_pushNotifications finished (2)
02:55:48 INFO - dom/push/test/xpcshell/test_notification_http2.js | Starting test_complete
02:55:48 INFO - (xpcshell/head.js) | test test_complete pending (2)
02:55:48 INFO - (xpcshell/head.js) | test run_next_test 1 finished (2)
02:55:48 INFO - (xpcshell/head.js) | test run_next_test 2 pending (2)
02:55:48 INFO - (xpcshell/head.js) | test test_complete finished (2)
02:55:48 INFO - (xpcshell/head.js) | test run_next_test 2 finished (1)
02:55:48 INFO - exiting test
02:55:48 INFO - "CONSOLE_MESSAGE: (warn) [JavaScript Warning: "ReferenceError: reference to undefined property txn.result" {file: "resource://gre/modules/IndexedDBHelper.jsm" line: 153}]"
02:55:48 WARNING - TEST-UNEXPECTED-FAIL | dom/push/test/xpcshell/test_notification_http2.js | test_complete - [test_complete : 221] A promise chain failed to handle a rejection: Error: Error removing subscription: 2147500036 - rejection date: Mon Feb 08 2016 02:55:47 GMT-0800 (PST) - stack: PushServiceDelete.prototype.onStopRequest@resource://gre/modules/PushServiceHttp2.jsm:207:21
02:55:48 INFO - observe@/builds/slave/test/build/tests/xpcshell/tests/dom/push/test/xpcshell/head.js:41:7
02:55:48 INFO - _execute_test@/builds/slave/test/build/tests/xpcshell/head.js:614:5
02:55:48 INFO - @-e:1:1
02:55:48 INFO - - false == true
02:55:48 INFO - resource://testing-common/PromiseTestUtils.jsm:assertNoUncaughtRejections:221
02:55:48 INFO - /builds/slave/test/build/tests/xpcshell/head.js:_execute_test:623
02:55:48 INFO - -e:null:1
02:55:48 INFO - exiting test
02:55:48 INFO - <<<<<<<
Flags: needinfo?(dd.mozilla)
Assignee | ||
Comment 11•9 years ago
|
||
Thanks.
This is a bug in PushServiceHttp2.js I file a bug and made a patch.
Flags: needinfo?(dd.mozilla)
Assignee | ||
Updated•9 years ago
|
Keywords: checkin-needed
Comment 12•9 years ago
|
||
Keywords: checkin-needed
Comment 13•9 years ago
|
||
bugherder |
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
status-firefox47:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla47
Assignee | ||
Comment 14•9 years ago
|
||
Comment on attachment 8715769 [details] [diff] [review]
bug_1242755_v1.patch
Approval Request Comment
[Feature/regressing bug #]: Bug 1238910
[User impact if declined]: Bug 1238910 changed how nsHttpConnectionMgr is shut down, but this cause some shutdown hangs mostly in nss. e.g https://crash-stats.mozilla.com/report/index/b6075322-659b-4d92-b28b-a30202160208#allthreads
[Describe test coverage new/current, TreeHerder]: This code path is executed during any shutdown.
[Risks and why]: Low risk it reverse the behavior as it was before bug 1238910 landed
[String/UUID change made/needed]:none
Attachment #8715769 -
Flags: approval-mozilla-aurora?
Comment 15•9 years ago
|
||
Comment on attachment 8715769 [details] [diff] [review]
bug_1242755_v1.patch
Backs out bug 1238910, to prevent shutdown hangs.
Please uplift to aurora
Attachment #8715769 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment 16•9 years ago
|
||
bugherder uplift |
You need to log in
before you can comment on or make changes to this bug.
Description
•