Closed Bug 1246778 Opened 4 years ago Closed 4 years ago

dont loop in nshttpconnection during shutdown

Categories

(Core :: Networking: HTTP, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla47
Tracking Status
firefox45 --- fixed
firefox46 --- fixed
firefox47 --- fixed

People

(Reporter: mcmanus, Assigned: mcmanus)

References

Details

(Whiteboard: [necko-active])

Attachments

(1 file)

dont loop in nshttpconnection onsocketreadable/onsocketwritable during shutdown
see https://bugzilla.mozilla.org/show_bug.cgi?id=1158189#c36
Whiteboard: [necko-active]
Attachment #8717175 - Flags: review?(dd.mozilla) → review+
Ehm.. this is a hotfix, right?  We should probably fix the true cause here, is there a bug for it?
https://hg.mozilla.org/mozilla-central/rev/5864b1e738f6
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla47
(In reply to Honza Bambas (:mayhemer) from comment #4)
> Ehm.. this is a hotfix, right?  We should probably fix the true cause here,
> is there a bug for it?

bug  1247205
(In reply to Dragana Damjanovic [:dragana] from comment #6)
> (In reply to Honza Bambas (:mayhemer) from comment #4)
> > Ehm.. this is a hotfix, right?  We should probably fix the true cause here,
> > is there a bug for it?
> 
> bug  1247205

Thank you.
Dragana, Honza,

so since this landed in the 02.10 nightly we've got 2 mozilla::net::nsHttpConnectionMgr::Shutdown hangs

https://crash-stats.mozilla.com/report/index/215501b2-99c3-40bf-8151-72fe62160211#allthreads
https://crash-stats.mozilla.com/report/index/215501b2-99c3-40bf-8151-72fe62160211#allthreads

neither of them have the socket thread actively doing stuff in nsHttpConnection anymore - which was the expected impact of this bug. So that's good (very small sample, of course) - the previous day (before this cset landed) 2/3 of the reports had nsHttpConnection on the stack.

However those two reports are odd. The second one actually has a dialog window open involving the auto dialer. The mind boggles on that one - I have no idea if that might block shutdown or not .. but we can defer dealing with it I think as its probably uncommon.

however I don't understand the first one. The socket transport thread appears to be missing. Presumably shutdown. but the main thread hasn't figured that out yet. any ideas on how to explain that?
Flags: needinfo?(honzab.moz)
Flags: needinfo?(dd.mozilla)
(In reply to Patrick McManus [:mcmanus] from comment #8)
> Dragana, Honza,
> 
> so since this landed in the 02.10 nightly we've got 2
> mozilla::net::nsHttpConnectionMgr::Shutdown hangs
> 
> https://crash-stats.mozilla.com/report/index/215501b2-99c3-40bf-8151-
> 72fe62160211#allthreads
> https://crash-stats.mozilla.com/report/index/215501b2-99c3-40bf-8151-
> 72fe62160211#allthreads

It's the same link.

Thread 4 is blocked, some SSL stuff.  Hard to say.


The other report is probably similar to bug 1152041.  I believe the dialog is either hidden or overlooked by the user.  We should close it (if possible) or make it block the shutdown (disallow shutdown - it's possible) and bring the window to the front.  Or remove it completely :D


> 
> neither of them have the socket thread actively doing stuff in
> nsHttpConnection anymore - which was the expected impact of this bug. So
> that's good (very small sample, of course) - the previous day (before this
> cset landed) 2/3 of the reports had nsHttpConnection on the stack.
> 
> However those two reports are odd. The second one actually has a dialog
> window open involving the auto dialer. The mind boggles on that one - I have
> no idea if that might block shutdown or not .. but we can defer dealing with
> it I think as its probably uncommon.
> 
> however I don't understand the first one. The socket transport thread
> appears to be missing. Presumably shutdown. but the main thread hasn't
> figured that out yet. any ideas on how to explain that?
Flags: needinfo?(honzab.moz)
(In reply to Honza Bambas (:mayhemer) from comment #10)
> (In reply to Patrick McManus [:mcmanus] from comment #8)
> > Dragana, Honza,
> > 
> > so since this landed in the 02.10 nightly we've got 2
> > mozilla::net::nsHttpConnectionMgr::Shutdown hangs
> > 
> > https://crash-stats.mozilla.com/report/index/215501b2-99c3-40bf-8151-
> > 72fe62160211#allthreads
> > https://crash-stats.mozilla.com/report/index/215501b2-99c3-40bf-8151-
> > 72fe62160211#allthreads
> 
> It's the same link.
> 
> Thread 4 is blocked, some SSL stuff.  Hard to say.
> 
> 
> The other report is probably similar to bug 1152041.  I believe the dialog
> is either hidden or overlooked by the user.  We should close it (if
> possible) or make it block the shutdown (disallow shutdown - it's possible)
> and bring the window to the front.  Or remove it completely :D
> 

It is thread 4, I have seen suck logs.

About the second one, I know that they exist, but I decided to deal with them later, there are more important ones. My idea was just to close dialog.
Flags: needinfo?(dd.mozilla)
(In reply to Dragana Damjanovic [:dragana] from comment #11)

https://crash-stats.mozilla.com/report/index/215501b2-99c3-40bf-8151-72fe62160211#allthreads

> It is thread 4, I have seen suck logs.

Thanks to you both. I should have seen that. Odd that its just part of the stack.

So it points to 

http://hg.mozilla.org/mozilla-central/annotate/ac39fba33c6d/nsprpub/pr/src/threads/prmon.c#l147

which is PR_Unlock on this stack
PR_EnterMonitor
ssl3_GatherCompleteHandshake 	security/nss/lib/ssl/ssl3gthr.c
ssl3_GatherAppDataRecord 	security/nss/lib/ssl/ssl3gthr.c
DoRecv 	security/nss/lib/ssl/sslsecur.c

which is pretty weird.. being blocked on LOCK is something you'd expect, but blocked on unlock would seem to imply that the lock variable itself (or perhaps the whole monitor) has been freed/corrupted while the LOCK was held. Or I _guess_ its possible this is just a snapshot and a total coincidence that its on the LOCK line when the watchdog kills it (and we're really spinning in a bigger loop). But that seems like a heck of a coincidence.

Probably related that this is shutdown and something has been freed:)

ni :keeler for thoughts. we can move this to another bug if there isn't one just for this. (david, this class of shutdown bug is currently a pretty big priority.)
Flags: needinfo?(dkeeler)
Hmm... we are looping here:

http://hg.mozilla.org/mozilla-central/annotate/ac39fba33c6d/nsprpub/pr/src/md/windows/w95cv.c#l140

 	nss3.dll!md_UnlockAndPostNotifies(0x01117080, 0x00000000, 0x00000000) Line 141	C
	nss3.dll!_PR_MD_UNLOCK(0x01117080) Line 363	C
 	nss3.dll!PR_Unlock(0x01117064) Line 328	C
right.. releasing a lock.. which is why I'm suspicious that the lock has been corrupted
The move of nsHttpConnectionMgr shutdown, had cause some strange shutdown hangs (also on Mac). ssl one can be one of them.
The patch that revert that is in central since today.
(In reply to Dragana Damjanovic [:dragana] from comment #15)
> The move of nsHttpConnectionMgr shutdown, had cause some strange shutdown
> hangs (also on Mac). ssl one can be one of them.
> The patch that revert that is in central since today.

which bug #? (there are so many :()
Bug 1238910 change when nsHttpConnectionMgr is shut down
bug 1242755 revert when nsHttpConnectionMgr is shut down.
just so I've got this right - you're suggesting that when 28069d20b125 from 1242755 lands in nightly it might/should resolve the ssl hang in comment 12? (that will be easy enough to track)
(In reply to Patrick McManus [:mcmanus] from comment #18)
> just so I've got this right - you're suggesting that when 28069d20b125 from
> 1242755 lands in nightly it might/should resolve the ssl hang in comment 12?
> (that will be easy enough to track)

The patch from 1238910 landed sometime mid January and I had a look at crash reports before mid January and I have not seen such stacks on Nightly.
It can be that bug 1238910 cause it but I am not completely sure :) Sorry, there was too many different hangs...
(In reply to Patrick McManus [:mcmanus] from comment #12)
> (In reply to Dragana Damjanovic [:dragana] from comment #11)
> 
> https://crash-stats.mozilla.com/report/index/215501b2-99c3-40bf-8151-
> 72fe62160211#allthreads
> 
> > It is thread 4, I have seen suck logs.
> 
> Thanks to you both. I should have seen that. Odd that its just part of the
> stack.
> 
> So it points to 
> 
> http://hg.mozilla.org/mozilla-central/annotate/ac39fba33c6d/nsprpub/pr/src/
> threads/prmon.c#l147
> 
> which is PR_Unlock on this stack
> PR_EnterMonitor
> ssl3_GatherCompleteHandshake 	security/nss/lib/ssl/ssl3gthr.c
> ssl3_GatherAppDataRecord 	security/nss/lib/ssl/ssl3gthr.c
> DoRecv 	security/nss/lib/ssl/sslsecur.c
> 
> which is pretty weird.. being blocked on LOCK is something you'd expect, but
> blocked on unlock would seem to imply that the lock variable itself (or
> perhaps the whole monitor) has been freed/corrupted while the LOCK was held.
> Or I _guess_ its possible this is just a snapshot and a total coincidence
> that its on the LOCK line when the watchdog kills it (and we're really
> spinning in a bigger loop). But that seems like a heck of a coincidence.
> 
> Probably related that this is shutdown and something has been freed:)
> 
> ni :keeler for thoughts. we can move this to another bug if there isn't one
> just for this. (david, this class of shutdown bug is currently a pretty big
> priority.)

I don't have any particular insight here, unfortunately. If this is something we should be tracking, let's open up a new bug.
Flags: needinfo?(dkeeler)
Comment on attachment 8717175 [details] [diff] [review]
dont loop in nshttpconnection during shutdown


Approval Request Comment
[Feature/regressing bug #]: unknown
[User impact if declined]: shutdown hangs from 1158189 (tens of thousands in crash stats) continue at fairly high rate
[Describe test coverage new/current, TreeHerder]: path gets executed on every test
[Risks and why]: very low risk - just check shutdown state and return early when we are shutting down
[String/UUID change made/needed]: none

This has significantly reduced the rate of 1158189 on nightly in crash-stats. It is mostly a workaround for a class of problems, of which 1247205 is likely a part.
Attachment #8717175 - Flags: approval-mozilla-beta?
Attachment #8717175 - Flags: approval-mozilla-aurora?
Comment on attachment 8717175 [details] [diff] [review]
dont loop in nshttpconnection during shutdown

Fix a top crash, taking it
Should be in 45 beta 7
Attachment #8717175 - Flags: approval-mozilla-beta?
Attachment #8717175 - Flags: approval-mozilla-beta+
Attachment #8717175 - Flags: approval-mozilla-aurora?
Attachment #8717175 - Flags: approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.