Closed
Bug 673438
Opened 13 years ago
Closed 6 years ago
crash [@ nsMsgProtocol::CloseSocket()] | [@ nsPop3Protocol::ProcessProtocolState] caused by code changes to enable detecting the server dropping the connection on an auth failure
Categories
(MailNews Core :: Networking: POP, defect)
Tracking
(thunderbird11-, thunderbird12-, thunderbird13-, thunderbird14-)
People
(Reporter: wsmwk, Unassigned)
References
Details
(4 keywords, Whiteboard: [regression:tb7?][not reproducible][bug 701533 more common])
Crash Data
Attachments
(2 files)
538 bytes,
patch
|
standard8
:
review+
|
Details | Diff | Splinter Review |
1.63 KB,
patch
|
standard8
:
review+
|
Details | Diff | Splinter Review |
crash [@ nsMsgProtocol::CloseSocket()] | [@ nsPop3Protocol::ProcessProtocolState] also @0x0 | nsMsgProtocol::CloseSocket() bf9fa1c7-0b9c-4df1-8fa2-a07292110606 (dgbole) TB7.0a EXCEPTION_ACCESS_VIOLATION_EXEC 0x5c32700 0 @0x5c32700 1 xul.dll nsMsgProtocol::CloseSocket mailnews/base/util/nsMsgProtocol.cpp:343 2 xul.dll nsPop3Protocol::ProcessProtocolState mailnews/local/src/nsPop3Protocol.cpp:4159 3 xul.dll nsMsgProtocol::OnDataAvailable mailnews/base/util/nsMsgProtocol.cpp:387 4 xul.dll nsInputStreamPump::OnStateTransfer netwerk/base/src/nsInputStreamPump.cpp:510 5 xul.dll nsInputStreamPump::OnInputStreamReady netwerk/base/src/nsInputStreamPump.cpp:400 6 xul.dll nsOutputStreamReadyEvent::Run xpcom/io/nsStreamUtils.cpp:114 7 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:618 8 xul.dll NS_ProcessNextEvent_P objdir-tb/mozilla/xpcom/build/nsThreadUtils.cpp:245 9 xul.dll mozilla::ipc::MessagePump::Run ipc/glue/MessagePump.cpp:110 There is also a variation that does not have pop3 in the stack bp-84a5678c-9ab4-4e32-99cb-69d7b2110705 EXCEPTION_ILLEGAL_INSTRUCTION 0xaf9d88 0 thunderbird.exe nsMsgProtocol::CloseSocket mailnews/base/util/nsMsgProtocol.cpp:310 1 thunderbird.exe nsMailboxProtocol::CloseSocket mailnews/local/src/nsMailboxProtocol.cpp:728 2 thunderbird.exe nsMsgProtocol::OnStopRequest mailnews/base/util/nsMsgProtocol.cpp:467 3 thunderbird.exe nsMailboxProtocol::OnStopRequest mailnews/local/src/nsMailboxProtocol.cpp:381 4 thunderbird.exe nsInputStreamPump::OnStateStop netwerk/base/src/nsInputStreamPump.cpp:578 5 thunderbird.exe nsInputStreamPump::OnInputStreamReady netwerk/base/src/nsInputStreamPump.cpp:403 6 xpcom_core.dll nsOutputStreamReadyEvent::Run xpcom/io/nsStreamUtils.cpp:112 7 xpcom_core.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:527
Reporter | ||
Comment 1•13 years ago
|
||
regression, or something like, because 99% of crashes are v6 and newer. Echoed by this comment in bp-2807f25f-1f64-49e8-851c-e497a2110818 "this is the 4th or 5th time Thunderbird has crashed since it's most recent update ... Prior to this update, Thunderbird had never crashed on my system." EXCEPTION_ACCESS_VIOLATION_EXEC 0x200020 0 @0x200020 1 xul.dll nsMsgProtocol::CloseSocket mailnews/base/util/nsMsgProtocol.cpp:343 2 xul.dll nsPop3Protocol::ProcessProtocolState mailnews/local/src/nsPop3Protocol.cpp:4159 3 xul.dll nsMsgProtocol::OnDataAvailable mailnews/base/util/nsMsgProtocol.cpp:387 4 xul.dll nsInputStreamPump::OnStateTransfer netwerk/base/src/nsInputStreamPump.cpp:510 5 xul.dll nsInputStreamPump::OnInputStreamReady netwerk/base/src/nsInputStreamPump.cpp:400
Keywords: regression,
topcrash
Comment 2•13 years ago
|
||
a pop3 protocol log might be useful here. There's no obvious fix (e.g., a null check). m_transport should be valid, so strans should also be valid.
Reporter | ||
Comment 3•13 years ago
|
||
requested logs from two reporters
Comment 4•13 years ago
|
||
From the log sent by a reporter, it looks like the OnStopRequest is coming after we've destroyed the pop3 sink, in the crash case, but not in the non-crashing case. So, in general, it seems to be coming later in the crash case. I'll investigate a bit more.
Comment 5•13 years ago
|
||
I suspect the regression might have to do with the code changes to enable detecting the server dropping the connection on an auth failure. From the protocol logs, it seems as if the failure case is when the server does not close the connection promptly in response to us sending the QUIT command. I'll see if I can duplicate that with the fake server.
Comment 6•13 years ago
|
||
Hmm, fakeserver never closes the socket, but relies on the client to do so. I'll try tweaking it to close the socket, perhaps after a slight delay.
Comment 7•13 years ago
|
||
Hmm, looking at the logs more closely, we're going from the send quit state to the error done state in the failing case. We're not even getting to the wait for response case, which I think means the server is dropping the connection immediately. Or, the SendData command is failing, when we send QUIT.
Reporter | ||
Comment 8•13 years ago
|
||
david, do we need more from bachastain?
Comment 9•13 years ago
|
||
I've requested a try server build with a little bit of extra pop3 logging: http://ftp.mozilla.org/pub/mozilla.org/thunderbird/try-builds/bienvenu@nventure.com-d0fc8b936460 It's basically our TB 7 beta build with a bit more logging to verify that SendData is failing. If someone who can reproduce this issue can try one of these builds when it comes out sometime later today (I'll update the bug when the build is ready), and log the failure, that would be helpful, thx.
Comment 10•13 years ago
|
||
Of course, the windows try server build failed - requested a new one - http://ftp.mozilla.org/pub/mozilla.org/thunderbird/try-builds/bienvenu@nventure.com-100eeec6abea - will update bug if that succeeds.
Comment 11•13 years ago
|
||
I'd like to get this onto the trunk and into aurora builds since try server is failing me on windows...
Assignee: nobody → dbienvenu
Attachment #562761 -
Flags: review?(mbanner)
Updated•13 years ago
|
Attachment #562761 -
Flags: review?(mbanner) → review+
Comment 12•13 years ago
|
||
tomorrow's nightly trunk build should have the extra logging - http://hg.mozilla.org/comm-central/rev/3e550b22c449
Reporter | ||
Comment 13•13 years ago
|
||
bienvenu, do we want a separate bug for this? ... on IRC you stated "we're pumping events from the nsPop3Protocol::Error function, which closes the socket out from under us" regarding bp-bab59133-4143-4e6f-9535-30e922110831 I've not found any other examples of this 47 line long stack. However, the top frames do roughly match up to a much shorter stack in the likes of bp-54433f4f-73b2-4e53-9ec8-ce33a2111017 and bp-2f00801d-b02c-4736-8bc5-89d1f2111019 (this stack is a low percentage for this signature - only 2 of 12 in a small sample from today).
Reporter | ||
Comment 14•13 years ago
|
||
so 80% of the stacks are like comment 1 ("the regression comment"), where frame 2 is nsPop3Protocol::ProcessProtocolState mailnews/local/src/nsPop3Protocol.cpp:4160 rather than nsPop3Protocol::OnStopRequest mailnews/local/src/nsPop3Protocol.cpp:999 in the case of comment 13. So comment 13 perhaps does need its own bug report
Reporter | ||
Comment 15•13 years ago
|
||
(In reply to Wayne Mery (:wsmwk) from comment #1) > regression, or something like, because 99% of crashes are v6 and newer. fallout from Bug 617721?? (startup) crash [@ nsPop3Protocol::LoadUrl(nsIURI*, nsISupports*)] and [@ nsPop3Protocol::LoadUrl] (Mac)
Keywords: regressionwindow-wanted
Comment 16•13 years ago
|
||
(In reply to Wayne Mery (:wsmwk) from comment #15) > fallout from Bug 617721?? > (startup) crash [@ nsPop3Protocol::LoadUrl(nsIURI*, nsISupports*)] and [@ > nsPop3Protocol::LoadUrl] (Mac) Highly doubtful. That only affects the start of the loading of the url, and only in certain error cases. The CloseSocket crash is after the whole url has run.
Reporter | ||
Comment 17•12 years ago
|
||
topcrash still. nicking this issue might go a long way toward fixing other topcrash signatures bp-6a08de55-dc15-4919-a796-0f3062111015 (sione, has gone back to version 3, plus a refreshed PC) bp-bab59133-4143-4e6f-9535-30e922110831 (matthew, crashes stopped in v7) bp-?? (dmthagar) I've asked for a protocol log bp-0257b213-1839-4692-a15d-682562110913 (André, waiting for feedback) bp-7d8b4cea-c474-4d61-947c-05e222120107 (rorscan, asked for protocol log) Happens also for trunk users bp-c6c0f0e4-5a38-407e-bdd7-e64cd2120102 version 10 As do other signatures for pop users, like rorscan who also sees mozalloc_abort(char const* const) | NS_DebugBreak_P | AbortIfOffMainThreadIfCheckFast arena_dalloc_small | arena_dalloc | free | NS_DestroyXPTCallStub_P @0x0 | nsASDOMWindowEnumerator::GetNext(nsISupports**) mozalloc_abort(char const* const) | NS_DebugBreak_P | AbortIfOffMainThreadIfCheckFast nsASDOMWindowEnumerator::GetNext(nsISupports**) in bp-5f240d4d-9133-43f1-8c9f-dce192111231 12/31/201110:08 bp-15704f27-8fc0-4a81-be39-96f9a2111218 12/18/20115:22 bp-786b43c5-f325-4af1-8ba5-5e72f2111205 12/5/20118:30 bp-07ad904c-04f8-4d7b-a41f-e468a2111112 11/12/20117:38
tracking-thunderbird11:
--- → ?
Whiteboard: [regression in v7?]
Reporter | ||
Comment 19•12 years ago
|
||
This is at risk for getting fixed in version 11. #3 crash for version 10, based on combined crash count of all signatures
Crash Signature: [@ nsMsgProtocol::CloseSocket()]
[@ @0x0 | nsMsgProtocol::CloseSocket()] → [@ nsMsgProtocol::CloseSocket()]
[@ @0x0 | nsMsgProtocol::CloseSocket()]
[@ nsRefPtr<nsTypedSelection>::~nsRefPtr<nsTypedSelection>() | `anonymous namespace''::ServerSocketListenerProxy::Release()]
Comment 20•12 years ago
|
||
This didn't make 11, but we'll see if there's anything we can do for 12.
Reporter | ||
Comment 23•12 years ago
|
||
per bienvenu "I'd need is access to a server & setup that caused this bug because I can't reproduce it here. " no testcase from any reporters yet :(
Keywords: testcase-wanted
Comment 24•12 years ago
|
||
I've requested a try server build with this patch, which adds a bit of logging when the protocol object is destroyed, and moves the CloseSocket call earlier in the sequence, to see if that has an effect on the crash location. I'll post a link to the try server build (which will be off the beta branch) when it's available. This is probably not a fix per se, because if the protocol object is getting deleted out from under itself, there's some sort of ref-counting or ownership model problem. But a protocol log from someone seeing the crash from a build with this patch would be very helpful.
Comment 25•12 years ago
|
||
builds should show up here - http://ftp.mozilla.org/pub/mozilla.org/thunderbird/try-builds/bienvenu@nventure.com-2524c134a535/ - only linux ones are there so far.
Comment 26•12 years ago
|
||
If we can get some testing on the try server builds, it'd be really useful.
tracking-thunderbird13:
--- → +
Reporter | ||
Comment 27•12 years ago
|
||
(In reply to Mark Banner (:standard8) from comment #26) > If we can get some testing on the try server builds, it'd be really useful. unfortunately out of a ffew dozen contacts over the past few months, mostly people either can't reproduce at all or arent' responding. For Sione, who is the most prolific crasher, David needs access to his server - which isn't happening. I'm about ready to give up on this hunt. Perhaps it would be best to get debug info actually IN the dumps reported to crash-stats, so we don't have to do these intensive goose chases.
Comment 28•12 years ago
|
||
(In reply to Wayne Mery (:wsmwk) from comment #27) > (In reply to Mark Banner (:standard8) from comment #26) > > If we can get some testing on the try server builds, it'd be really useful. > > unfortunately out of a ffew dozen contacts over the past few months, mostly > people either can't reproduce at all or arent' responding. Can't reproduce with the try build? Or can't reproduce in general? > > For Sione, who is the most prolific crasher, David needs access to his > server - which isn't happening. Having him run the try server build would be fine. > > I'm about ready to give up on this hunt. Perhaps it would be best to get > debug info actually IN the dumps reported to crash-stats, so we don't have > to do these intensive goose chases. I'm happy to land the patch as is and see if it helps. But I don't know how to add logging information to dumps reported to crash-stats.
Reporter | ||
Comment 29•12 years ago
|
||
(In reply to David :Bienvenu from comment #28) > (In reply to Wayne Mery (:wsmwk) from comment #27) > (In reply to Mark > Banner (:standard8) from comment #26) > > If we can get some testing on the > try server builds, it'd be really useful. > > unfortunately out of a ffew > dozen contacts over the past few months, mostly > people either can't > reproduce at all or arent' responding. Can't reproduce with the try build? > Or can't reproduce in general? > > For Sione, who is the most prolific > crasher, David needs access to his > server - which isn't happening. Having > him run the try server build would be fine. > > I'm about ready to give up > on this hunt. Perhaps it would be best to get > debug info actually IN the > dumps reported to crash-stats, so we don't have > to do these intensive > goose chases. I'm happy to land the patch as is and see if it helps. But I > don't know how to add logging information to dumps reported to crash-stats.
Reporter | ||
Comment 30•12 years ago
|
||
(Ignore comment 29 please ) (In reply to David :Bienvenu from comment #28) > (In reply to Wayne Mery (:wsmwk) from comment #27) > > (In reply to Mark Banner (:standard8) from comment #26) > > > If we can get some testing on the try server builds, it'd be really useful. > > > > unfortunately out of a ffew dozen contacts over the past few months, mostly > > people either can't reproduce at all or arent' responding. > > Can't reproduce with the try build? Or can't reproduce in general? The later. And for those few that I contacted specifically about the last try build, I think only 2 of ~10 people replied. Sione being one. > > For Sione, who is the most prolific crasher, David needs access to his > > server - which isn't happening. > Having him run the try server build would be fine. > > > > I'm about ready to give up on this hunt. Perhaps it would be best to get > > debug info actually IN the dumps reported to crash-stats, so we don't have > > to do these intensive goose chases. > > I'm happy to land the patch as is and see if it helps. Closing in on a year anniversary, we probably do have to take our best shot at it. However, it will not be sufficient to have it only in the daily build. The crash rate is too low - only 7 crashes in the past 4 months for all betas and alphas (v12/13/14)[1]. I think we need to drive the patch into the beta. Then, if it doesn't break anything, it will still take us 2-3 weeks to know if it had any impact. And if that doesn't work, we might blast a message via crash-stats to all the crashers that we need volunteers to work with the developer in devising a fix. > But I don't know how to add logging information to dumps reported to crash-stats. yeah, me neither. I wonder for how many issues an internal log might be helpful?? a separate discussion perhaps? [1] 4 weeks of non-production crashes - https://crash-stats.mozilla.com/report/list?product=Thunderbird&version=Thunderbird%3A14.0a1&version=Thunderbird%3A13.0a2&version=Thunderbird%3A12.0b4&version=Thunderbird%3A12.0b3&version=Thunderbird%3A12.0b2&version=Thunderbird%3A12.0b1&query_search=signature&query_type=exact&query=nsMsgProtocol%3A%3ACloseSocket%28%29&reason_type=contains&date=04%2F18%2F2012%2014%3A57%3A55&range_value=4&range_unit=weeks&hang_type=any&process_type=any&do_query=1&signature=nsMsgProtocol%3A%3ACloseSocket%28%29
Comment 31•12 years ago
|
||
In general, the logging information is way too private to include in crash-stats. We'd have to sanitize it to the extent that it's hard to know if it would be useful.
Reporter | ||
Comment 32•12 years ago
|
||
one v13 crash in 4 weeks is bp-ff44a4a4-2c33-48e3-ad0c-a03062120409 - attempting to contact reporter.
Reporter | ||
Comment 33•12 years ago
|
||
hmm, if bug 701533 is the same issue, and if we give the patch a week on v13 alpha (soon to be beta) then we can probably assess it's effectiveness. Because in 4 weeks we had 5 v14 crashes and 10 v13 crashes. can we get it into the v13 beta?
Comment 34•12 years ago
|
||
Comment on attachment 613031 [details] [diff] [review] a bit more logging, move CloseSocket call - checked in. The idea behind moving the CloseSocket call is that perhaps the code before it might have been causing the pop3 protocol object's ref count to go to zero and getting deleted. This isn't a proper fix, but if the crashes go down or move, that'll be a clue. And logging the destructor will also tell us if that's happening.
Attachment #613031 -
Flags: review?(mbanner)
Updated•12 years ago
|
Attachment #613031 -
Flags: review?(mbanner) → review+
Comment 35•12 years ago
|
||
extra logging and moving of close socket landed - http://hg.mozilla.org/comm-central/rev/ff9100b84368
Status: NEW → ASSIGNED
Updated•12 years ago
|
Attachment #613031 -
Attachment description: a bit more logging, move CloseSocket call → a bit more logging, move CloseSocket call - checked in.
Comment 36•12 years ago
|
||
Still tracking this, we'll have to watch how the latest changes are going.
tracking-thunderbird14:
--- → +
Comment 37•12 years ago
|
||
Taking off tracking for now. Might be good to see if the changes landed previously have affected anything.
Reporter | ||
Comment 38•12 years ago
|
||
sadly, no one can reproduce.
Whiteboard: [regression in v7?] → [regression in v7?][not reproducible for most]
Reporter | ||
Comment 39•12 years ago
|
||
bug 701533 is more common. and is topcrash also for 10.0.6esr
Whiteboard: [regression in v7?][not reproducible for most] → [regression in v7?][not reproducible for most][bug 701533 more common]
Reporter | ||
Comment 40•12 years ago
|
||
currently attempting to get data via Larry who reported bp-f9ef40ba-f082-4fd6-8c69-286702120721 and other crashes
Reporter | ||
Comment 41•12 years ago
|
||
Larry is no longer able to reproduce (In reply to Wayne Mery (:wsmwk) from comment #40) > currently attempting to get data via Larry who reported > bp-f9ef40ba-f082-4fd6-8c69-286702120721 and other crashes perhaps a related crash is Bug 637338, which is another signature Larry was hitting
Comment 42•11 years ago
|
||
With combined signatures, it's #42 top crasher in TB 17.0.2 so not a top crasher.
Keywords: topcrash
Reporter | ||
Updated•11 years ago
|
Whiteboard: [regression in v7?][not reproducible for most][bug 701533 more common] → [regression:tb7?][not reproducible][bug 701533 more common]
Reporter | ||
Updated•10 years ago
|
Assignee: mozilla → nobody
Reporter | ||
Updated•10 years ago
|
Status: ASSIGNED → NEW
Comment 43•9 years ago
|
||
Removing myslef on all the bugs I'm cced on. Please NI me if you need something on MailNews Core bugs from me.
Updated•9 years ago
|
Crash Signature: [@ nsMsgProtocol::CloseSocket()]
[@ @0x0 | nsMsgProtocol::CloseSocket()]
[@ nsRefPtr<nsTypedSelection>::~nsRefPtr<nsTypedSelection>() | `anonymous namespace''::ServerSocketListenerProxy::Release()] → [@ nsMsgProtocol::CloseSocket()]
[@ @0x0 | nsMsgProtocol::CloseSocket()]
[@ nsRefPtr<nsTypedSelection>::~nsRefPtr<nsTypedSelection>() | `anonymous namespace''::ServerSocketListenerProxy::Release()]
[@ nsMsgProtocol::CloseSocket]
[@ @0x0 | nsMsgProtoco…
Reporter | ||
Comment 44•7 years ago
|
||
bp-bd2dce79-98e2-4a15-85d2-e24102170324 is perhaps a current example. But an extremely rare example - only 1 in 3 crashes have the same stack. per comment 5 caused by "code changes to enable detecting the server dropping the connection on an auth failure". The reduction in crash rate since then might be because of patches or changes in password handling. Some are cited in https://hg.mozilla.org/comm-central/log/tip/mailnews/local/src/nsPop3Protocol.cpp ... but one must doubt the underlying protocol bugs have been fixed.
Summary: crash [@ nsMsgProtocol::CloseSocket()] | [@ nsPop3Protocol::ProcessProtocolState] → crash [@ nsMsgProtocol::CloseSocket()] | [@ nsPop3Protocol::ProcessProtocolState] caused by code changes to enable detecting the server dropping the connection on an auth failure
Reporter | ||
Comment 45•6 years ago
|
||
No crashes after 17.0.8 (which are still being reported)
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•