Closed
Bug 1292181
Opened 8 years ago
Closed 8 years ago
Crash in shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_WaitCondVar | mozilla::CondVar::Wait | nsEventQueue::GetEvent - crash in nsHttpConnectionMgr
Categories
(Core :: Networking, defect)
Tracking
()
RESOLVED
FIXED
mozilla51
People
(Reporter: dragana, Assigned: dragana)
Details
(Keywords: topcrash, Whiteboard: [necko-active])
Crash Data
Attachments
(1 file, 2 obsolete files)
4.33 KB,
patch
|
dragana
:
review+
lizzard
:
approval-mozilla-aurora+
lizzard
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_WaitCondVar | mozilla::CondVar::Wait | nsEventQueue::GetEvent
The change that we made was bug 698882. We tried to uplift the bug to 47 and 48 needed to backed it out from both when they went into beta - 1279163 and bug 698882 comment 143.
Assignee | ||
Updated•8 years ago
|
Summary: Crash in shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_WaitCondVar | mozilla::CondVar::Wait | nsEventQueue::GetEvent → Crash in shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_WaitCondVar | mozilla::CondVar::Wait | nsEventQueue::GetEvent on 49Beta
Assignee | ||
Updated•8 years ago
|
Whiteboard: [necko-active]
Assignee | ||
Comment 1•8 years ago
|
||
Bug 1158189 was originally fixed by switching to MSVC 2015. It is worth asking if there is a difference between beta and aurora regarding toolchain.
Comment 2•8 years ago
|
||
[Tracking Requested - why for this release]: Currently the top crash in beta. We should track this.
tracking-firefox49:
--- → ?
Comment 3•8 years ago
|
||
Fixing signature so it shows up in crash stats.
Crash Signature: shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_WaitCondVar | mozilla::CondVar::Wait | nsEventQueue::GetEvent → [@ shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_WaitCondVar | mozilla::CondVar::Wait | nsEventQueue::GetEvent]
status-firefox49:
--- → affected
Assignee | ||
Comment 4•8 years ago
|
||
Assignee | ||
Comment 5•8 years ago
|
||
There is at least 10 different signatures under this one. The line that we can search is to short, it would be good if signature is at least 5 lines deeper into the stack.
:marcia do you know who we can ask about this? It is very hard to search.
Flags: needinfo?(mozillamarcia.knous)
Assignee | ||
Comment 6•8 years ago
|
||
I look through 100 out of 380 crashes from the last week for ff51, 50 and 49aurora and nightly (not beta)
there are 31 hangs in nsHttpConnectionMgr -> our old shutdown hang that hos just hidden :( :( :(
that is approx. 117 crashes on nightly and aurora in last week.
Comment 7•8 years ago
|
||
(In reply to Dragana Damjanovic [:dragana] from comment #5)
> There is at least 10 different signatures under this one. The line that we
> can search is to short, it would be good if signature is at least 5 lines
> deeper into the stack.
>
> :marcia do you know who we can ask about this? It is very hard to search.
For Socorro searching, that would be someone in that team - #breakpad is a good place to start.
Flags: needinfo?(mozillamarcia.knous)
Assignee | ||
Comment 8•8 years ago
|
||
in 100 crashes there is 10 crashes on nightly and 21 on aurora, that means we have around 38 on nightly and 79 on aurora, this is slightly more than what we use to have.
Assignee | ||
Comment 9•8 years ago
|
||
(In reply to Marcia Knous [:marcia - use ni] from comment #7)
> (In reply to Dragana Damjanovic [:dragana] from comment #5)
> > There is at least 10 different signatures under this one. The line that we
> > can search is to short, it would be good if signature is at least 5 lines
> > deeper into the stack.
> >
> > :marcia do you know who we can ask about this? It is very hard to search.
>
> For Socorro searching, that would be someone in that team - #breakpad is a
> good place to start.
After a chat on #breakpad
I have submitted bug 1292608.
Comment 10•8 years ago
|
||
Comment on attachment 8778280 [details] [diff] [review]
bug_1292181.patch
Review of attachment 8778280 [details] [diff] [review]:
-----------------------------------------------------------------
my other concern here is PollableEvent does not write to the pipe when called on the socket thread as compared to the old system. That was actually a deadlock, but you could work around that with the awful 2048 check for the moment - do you think it should do so again?
I'm on PTO for the next week.. nick or hozna can review.
::: netwerk/base/PollableEvent.cpp
@@ +220,5 @@
> if (PR_GetCurrentThread() == gSocketThread) {
> SOCKET_LOG(("PollableEvent::Signal OnSocketThread nop\n"));
> return true;
> }
> +#ifdef XP_WIN
so because of the ifdef you've got two { and one }.. I hate that :(
can you rewrite the ifdef into a bool assignment if just put the bool into the conditional?
@@ +221,5 @@
> SOCKET_LOG(("PollableEvent::Signal OnSocketThread nop\n"));
> return true;
> }
> +#ifdef XP_WIN
> + if (mSignaled >= 2048) {
oh dear. write the comment :)
@@ +245,5 @@
> // necessary because of the "dont signal on socket thread" optimization
> MOZ_ASSERT(PR_GetCurrentThread() == gSocketThread);
>
> SOCKET_LOG(("PollableEvent::Clear\n"));
> + mSignaled = 0;
I think this needs to be decremented by the number actually read.. which sadly can be less than all of them by API..
also there is a branch below that has an assert about not reading more than 1 that needs to be suppressed
Attachment #8778280 -
Flags: review?(mcmanus) → review-
Assignee | ||
Comment 11•8 years ago
|
||
If this patch(some version of it) helps to decrease number of hangs, that means that writing more often to the socket help -> we should try to write to the socket more often during shutdown, something like a timer to send an event every 10ms(this is only for shutdown!)
Comment 12•8 years ago
|
||
49 top crash, tracking.
Assignee | ||
Comment 13•8 years ago
|
||
(In reply to Patrick McManus [:mcmanus] PTO until Aug 15 from comment #10)
> Comment on attachment 8778280 [details] [diff] [review]
> bug_1292181.patch
>
> Review of attachment 8778280 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> my other concern here is PollableEvent does not write to the pipe when
> called on the socket thread as compared to the old system. That was actually
> a deadlock, but you could work around that with the awful 2048 check for the
> moment - do you think it should do so again?
>
I have decided to be conservative because I want to uplift this to beta. I have changed back to the same behavior as before bug 698882, so we will signal also for events dispatched on SocketThread. And on nightly we can try to introduce some optimizations. The change will be only for Windows.
> I'm on PTO for the next week.. nick or hozna can review.
>
ok
Assignee | ||
Updated•8 years ago
|
status-firefox50:
--- → affected
status-firefox51:
--- → affected
Assignee | ||
Updated•8 years ago
|
Summary: Crash in shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_WaitCondVar | mozilla::CondVar::Wait | nsEventQueue::GetEvent on 49Beta → Crash in shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_WaitCondVar | mozilla::CondVar::Wait | nsEventQueue::GetEvent - crash in nsHttpConnectionMgr
Assignee | ||
Comment 14•8 years ago
|
||
Attachment #8778280 -
Attachment is obsolete: true
Attachment #8779638 -
Flags: review?(hurley)
Assignee | ||
Comment 15•8 years ago
|
||
Comment 16•8 years ago
|
||
Comment on attachment 8779638 [details] [diff] [review]
bug_1292181.patch
Review of attachment 8779638 [details] [diff] [review]:
-----------------------------------------------------------------
::: netwerk/base/PollableEvent.cpp
@@ +231,5 @@
> +
> +#ifndef XP_WIN
> + // To wake up the poll writing once is enough, but for Windows that can cause
> + // hangs so we will write for every event.
> + // For no Windows systems it is enough to write just once.
nit: non-Windows
Attachment #8779638 -
Flags: review?(hurley) → review+
Assignee | ||
Comment 17•8 years ago
|
||
Attachment #8779638 -
Attachment is obsolete: true
Attachment #8780122 -
Flags: review+
Assignee | ||
Updated•8 years ago
|
Keywords: checkin-needed
Comment 18•8 years ago
|
||
Pushed by kwierso@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/5ad3abb5627c
Let each dispatch event sends data to the socket. r=mcmanus yolo CLOSED TREE
Keywords: checkin-needed
Comment 19•8 years ago
|
||
bugherder |
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla51
Comment 20•8 years ago
|
||
so far there are no more crashes on nightly builds after the patch has landed: http://bit.ly/2bBBTHu :-)
can we uplift this to aurora and beta (where this is currently ~10% of all recorded browser crashes) now as well?
Comment 21•8 years ago
|
||
Dragana, can you request uplift, if you think it is not too risky? I'd like to get this into the beta 5 build on Thursday morning. This may fix our #2 top crash in beta. Thanks!
Flags: needinfo?(dd.mozilla)
Assignee | ||
Comment 22•8 years ago
|
||
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #21)
> Dragana, can you request uplift, if you think it is not too risky? I'd like
> to get this into the beta 5 build on Thursday morning. This may fix our #2
> top crash in beta. Thanks!
I was waiting for couple of days to see if everything works fine. 2-3 days are enough because this is very much used code path. I will request an uplift.
Flags: needinfo?(dd.mozilla)
Assignee | ||
Comment 23•8 years ago
|
||
Comment on attachment 8780122 [details] [diff] [review]
bug_1292181.patch
Approval Request Comment
[Feature/regressing bug #]: Should fix #2 top crash in beta. The crash is very low volume on Nightly and Aurora so it is only possible to see an improvement on beta.
[User impact if declined]: shutdown hang on Windows
[Describe test coverage new/current, TreeHerder]: It heavily used code path.
[Risks and why]: low
[String/UUID change made/needed]: none
Attachment #8780122 -
Flags: approval-mozilla-beta?
Attachment #8780122 -
Flags: approval-mozilla-aurora?
Comment 24•8 years ago
|
||
Comment on attachment 8780122 [details] [diff] [review]
bug_1292181.patch
High volume crash on beta, let's take this for aurora and for beta 5.
Thanks Dragana!
Attachment #8780122 -
Flags: approval-mozilla-beta?
Attachment #8780122 -
Flags: approval-mozilla-beta+
Attachment #8780122 -
Flags: approval-mozilla-aurora?
Attachment #8780122 -
Flags: approval-mozilla-aurora+
Comment 25•8 years ago
|
||
bugherder uplift |
Comment 26•8 years ago
|
||
bugherder uplift |
Updated•8 years ago
|
Crash Signature: [@ shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_WaitCondVar | mozilla::CondVar::Wait | nsEventQueue::GetEvent] → [@ shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_WaitCondVar | mozilla::CondVar::Wait | nsEventQueue::GetEvent]
[@ shutdownhang | mozilla::CondVar::Wait | nsEventQueue::GetEvent]
Comment 27•8 years ago
|
||
some of these crashes still seem to be happening in 49.0b5: http://bit.ly/2b7Q6GB - but i think in reduced volume (it is still a bit early to tell for sure)
Assignee | ||
Comment 28•8 years ago
|
||
(In reply to [:philipp] from comment #27)
> some of these crashes still seem to be happening in 49.0b5:
> http://bit.ly/2b7Q6GB - but i think in reduced volume (it is still a bit
> early to tell for sure)
Thanks, Philipp!
This patch will not remove the crash completely, but just reduce it to the volume the signatures from bug 1158189 had on 48 and earlier.
Comment 29•8 years ago
|
||
Bug 1272911 should hopefully sort out splitting the signature.
Assignee | ||
Comment 30•8 years ago
|
||
(In reply to [:philipp] from comment #27)
> some of these crashes still seem to be happening in 49.0b5:
> http://bit.ly/2b7Q6GB - but i think in reduced volume (it is still a bit
> early to tell for sure)
Looking at 49b4, 49b5, 49b6 and 49b7:
49b4 has 3765 crashes
49b5 1527
49b6 1028
49b7 592
This is total number not per week number. This is reduced to do volume of bug 1158189 in beta48 and earlier. That is what I expected with this patch.
Comment 31•8 years ago
|
||
Thank you Dragana for the fix! I agree that it looks greatly improved for 49.
You need to log in
before you can comment on or make changes to this bug.
Description
•