Last Comment Bug 531794 - crash [@ nsNntpCacheStreamListener::OnDataAvailable(nsIRequest*, nsISupports*, nsIInputStream*, unsigned int, unsigned int)]
: crash [@ nsNntpCacheStreamListener::OnDataAvailable(nsIRequest*, nsISupports*...
Status: RESOLVED FIXED
[needs verification post 3.0.1]
: crash, fixed-seamonkey2.0.3, regression
Product: MailNews Core
Classification: Components
Component: Networking: NNTP (show other bugs)
: 1.9.1 Branch
: x86 All
: -- critical with 1 vote (vote)
: Thunderbird 3.1a1
Assigned To: timeless
:
Mentors:
: 536806 (view as bug list)
Depends on: 130442
Blocks:
  Show dependency treegraph
 
Reported: 2009-11-30 03:10 PST by Wayne Mery (:wsmwk, NI for questions)
Modified: 2011-06-13 10:01 PDT (History)
6 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
.1-fixed


Attachments
patch (1.55 KB, patch)
2009-12-25 04:04 PST, timeless
Pidgeot18: review+
mozilla: superreview+
Details | Diff | Review
updated (1.63 KB, patch)
2010-01-02 19:48 PST, timeless
timeless: review+
timeless: superreview+
standard8: approval‑thunderbird3.0.1+
Details | Diff | Review

Description Wayne Mery (:wsmwk, NI for questions) 2009-11-30 03:10:37 PST
crash [@ nsNntpCacheStreamListener::OnDataAvailable(nsIRequest*, nsISupports*, nsIInputStream*, unsigned int, unsigned int)]
#28 crash for 3.0, 0.6% of 3.0 crashes, all OS
suggest this is regression, because no crashes for 3.0b4 and none for at least the 4 months prior to 20091103030839 build

the ones with comments...
bp-46f513e7-6dcf-4063-8dfc-15dfc2091125 v3.0.1pre 20091123031747
"Using 'n' key navigtion in a news thread. possibly was punching the key to rapidly"
0	thunderbird.exe	nsNntpCacheStreamListener::OnDataAvailable	 mailnews/news/src/nsNNTPProtocol.cpp:713
1	thunderbird.exe	nsInputStreamPump::OnStateTransfer	netwerk/base/src/nsInputStreamPump.cpp:508
2	thunderbird.exe	nsInputStreamPump::OnInputStreamReady	netwerk/base/src/nsInputStreamPump.cpp:398
3	xpcom_core.dll	nsInputStreamReadyEvent::Run	xpcom/io/nsStreamUtils.cpp:111  

bp-b195ddd1-6147-4fc5-8ef7-beeb42091114
quit Shredder using Cmd+Q, was viewing a newsgroup

bp-7a3c6881-f56b-4091-81d7-c0f9d2091125
Comment 1 Joshua Cranmer [:jcranmer] 2009-11-30 04:31:50 PST
The stacks look suspiciously like the crashes in bug 130442 were just shunted to another portion of the code.
Comment 2 David :Bienvenu 2009-11-30 06:48:00 PST
that's possible, but I suspect only partly true, in that I know when I reproduced the bug, I was able to continue on from the crash in OnStartRequest, and I didn't crash in OnDataAvailable. But we should definitely add some more checks since we're crashing here.
Comment 3 abittner 2009-12-24 11:33:02 PST
trying to help a bit: currently there are two similarly named crashes in the topcrashes of 3.0, some position #20 and some #80, number tweny is the one being discussed here OnDataAvailable, and the other is OnStopRequest.

#20:
http://crash-stats.mozilla.com/report/list?product=Thunderbird&version=Thunderbird%3A3.0&platform=windows&query_search=signature&query_type=exact&query=&date=&range_value=1&range_unit=weeks&do_query=1&signature=nsNntpCacheStreamListener%3A%3AOnDataAvailable%28nsIRequest%2A%2C%20nsISupports%2A%2C%20nsIInputStream%2A%2C%20unsigned%20int%2C%20unsigned%20int%29

when i look at the comments for these two types of crashes, there seem to be some relation.

#80:
http://crash-stats.mozilla.com/report/list?product=Thunderbird&version=Thunderbird%3A3.0&platform=windows&query_search=signature&query_type=exact&query=&date=&range_value=1&range_unit=weeks&do_query=1&signature=nsNntpCacheStreamListener%3A%3AOnStopRequest%28nsIRequest%2A%2C%20nsISupports%2A%2C%20unsigned%20int%29

many comments talk about simply navigating public nntp servers with thunderbird 3.0 and clicking through folders and messages, never to happen anything at all or usenet message displaying being very slowly and delayed and resulting in these two kinds of crash types.

i am also affected by these kinds of crashes. ever since the upgrade to tb3.0 i have experienced these kind of erratic behaviours as well exactly the same way.
regards.
Comment 4 timeless 2009-12-25 04:04:09 PST
Created attachment 419157 [details] [diff] [review]
patch

please note that the ODA contract requires *someone* to read the data, so you can't simply ignore the call...
Comment 5 David :Bienvenu 2009-12-26 07:15:04 PST
FWIW, I hit the formerly crashing code in OnStopRequest very frequently, but I've never crashed in ODA, so I'm fairly certain that there are different causes at work.
Comment 6 abittner 2009-12-29 03:07:07 PST
i seem to be getting the ondataavailable bug more often whenever the newsgroups messages are only fetched as headers in the lists/overview, and only when navigating through these header/subject elements and clicking them makes the tb 3.0 client establish smallish nntp connections to the usenet server and trying to fetch the rest of the body (to complete the whole message that was being clicked on or activated). during these times i seem to get more ondataavailable crashes than the other bug with the onstoprequest.

if im not mistaken i was able to repro this behaviour roughly when subscribing to a newsgroup and first completely synchronizing all the messages to my local disk and then doing the quick browsing/clicks in the already present nntp messages in that group. this mostly caused the onstoprequest crashes.

then i went for rightclick that group and refresh/rebuild index of that group, and then doing again the "impatient" navigation in that usenetgroup with only the headers of the messages available at first, causes more of those ondataavailable crashes.

also navigating via keyboard commands rather than mouse/click devices up and down the message list within a newsgroup seems to be a much quicker and easier way to repro these crashes.

maybe keyboard events get handled quicker or fill buffers more quickly and choke threads or that kind of stuff. please see the comments in the other bugzilla entry to these two bugs.

regards.
Comment 7 Joshua Cranmer [:jcranmer] 2010-01-01 13:32:05 PST
Comment on attachment 419157 [details] [diff] [review]
patch

>diff --git a/mailnews/news/src/nsNNTPProtocol.cpp b/mailnews/news/src/nsNNTPProtocol.cpp
>--- a/mailnews/news/src/nsNNTPProtocol.cpp
>+++ b/mailnews/news/src/nsNNTPProtocol.cpp
>@@ -684,13 +684,15 @@ NS_IMETHODIMP
> nsNntpCacheStreamListener::OnStopRequest(nsIRequest *request, nsISupports * aCtxt, nsresult aStatus)
> {
>   nsCOMPtr <nsIRequest> ourRequest = do_QueryInterface(mChannelToUse);
>-  nsresult rv = mListener->OnStopRequest(ourRequest, aCtxt, aStatus);
>+  nsresult rv = NS_OK;
>+  if (mListener)
>+    mListener->OnStopRequest(ourRequest, aCtxt, aStatus);

Could you add an NS_ASSERTION on the mListener being there?

Other than that, r+.
Comment 8 Phil Ringnalda (:philor) 2010-01-01 23:09:38 PST
Comment on attachment 419157 [details] [diff] [review]
patch

Note that timeless will almost certainly never look at this bug again, or address review comments, or check it in, so someone else will need to pick all that up.
Comment 9 David :Bienvenu 2010-01-02 06:58:31 PST
Comment on attachment 419157 [details] [diff] [review]
patch

sr=me, modulo the assertion per jcranmer.
Comment 10 Ludovic Hirlimann [:Usul] 2010-01-02 08:21:29 PST
(In reply to comment #8)
> (From update of attachment 419157 [details] [diff] [review])
> Note that timeless will almost certainly never look at this bug again, or
> address review comments, or check it in, so someone else will need to pick all
> that up.

Phil want to take it from here ?
Comment 11 Phil Ringnalda (:philor) 2010-01-02 17:39:09 PST
Nope: I'm timeless' checkin bitch for things that can land as-is, but I figure review comments are between his reviewers and him (and you can see by searching for my "[timeless: need...]" whiteboards that I'm not kidding or exaggerating when I say that he will not be the one to address them, so either his reviewers or someone else who wants the patch landed badly enough will have to).
Comment 12 timeless 2010-01-02 19:48:12 PST
Created attachment 419800 [details] [diff] [review]
updated

philor is correct. i'm on vacation atm, but even if i wasn't, i'm generally incredibly unlikely to do such work (especially as while I'm working, I don't work on mail, or pretty much anything where reviewer focus interests my employer). if someone wants an assertion added, it's better for them to add it, as they're much more likely to know what they want it to say.
Comment 13 Mark Banner (:standard8) 2010-01-04 02:55:53 PST
Checked into trunk: http://hg.mozilla.org/comm-central/rev/fb34087a1b2e
Comment 14 Wayne Mery (:wsmwk, NI for questions) 2010-01-04 04:40:46 PST
*** Bug 536806 has been marked as a duplicate of this bug. ***
Comment 15 abittner 2010-01-05 05:15:53 PST
i was trying todays 20100105 build from
http://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/latest-comm-1.9.1/thunderbird-3.0.1pre.en-US.win32.installer.exe

how do i actually find out when that above patch in trunk arrived in a non-trunk (branch?) release or nightly build? im an not that familiar with all the mozilla tools and semi-public systems, websites and mechanisms.


the signatures are onstoprequest though:
bp-2a60221b-201a-49b6-b1d5-454642100105
bp-12a3a33f-f0fa-4a2f-b5b1-8d59b2100105
bp-ba34ca67-f7ac-4148-a7bc-69fe82100105
regards.
Comment 16 Mark Banner (:standard8) 2010-01-05 05:32:51 PST
(In reply to comment #15)
> i was trying todays 20100105 build from
> http://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/latest-comm-1.9.1/thunderbird-3.0.1pre.en-US.win32.installer.exe
> 
> how do i actually find out when that above patch in trunk arrived in a
> non-trunk (branch?) release or nightly build? im an not that familiar with all
> the mozilla tools and semi-public systems, websites and mechanisms.

You'll need to try the builds from http://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/latest-comm-1.9.2/ (dated 4th or later).

This patch hasn't gone in for the 1.9.1 builds yet.
Comment 17 abittner 2010-01-05 06:02:00 PST
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2b6pre) Gecko/20100105 Lanikai/3.1a1pre

seems to be harder to crash, couldn't reproduce a crash yet, but this might not mean that a 3.0.x build wont crash as the 3.1a1pre build seems to be much more reactive and has way lower latency than the tested 3.0.1pre or the 3.0 final release.

would be good to test a real 3.0.1build with this patch.
but anyways, many for the fix, the work and your efforts.
best regards.
Comment 18 Mark Banner (:standard8) 2010-01-05 06:04:59 PST
(In reply to comment #17)
> would be good to test a real 3.0.1build with this patch.
> but anyways, many for the fix, the work and your efforts.
> best regards.

Thanks for the testing, We'll hopefully approve this for 3.0.x in the next couple of days and get it in for the 3.0.1 release.
Comment 19 abittner 2010-01-05 06:12:23 PST
i have "my" crash back, but the signature hints for a different bug, but still in the nntp area ;(

198a8c58-9321-4615-a0dc-554832100105

some different source-code area.
i have commented the crash.
greetings.
Comment 20 Wayne Mery (:wsmwk, NI for questions) 2010-01-05 06:49:57 PST
(In reply to comment #19)
> i have "my" crash back, but the signature hints for a different bug, but still
> in the nntp area ;(
> 
> 198a8c58-9321-4615-a0dc-554832100105
> 
> some different source-code area.

yeah, different crash. but it it may indicate this patch isn't good or sufficient. Jcranmer might be able to say

bp-198a8c58-9321-4615-a0dc-554832100105 
is nsNNTPProtocol::DisplayArticle(nsIInputStream*, unsigned int)
Comment 21 Mark Banner (:standard8) 2010-01-10 06:37:36 PST
Comment on attachment 419800 [details] [diff] [review]
updated

From the comments etc, this sounds like it will improve the situation for most people. Whether or not it'll expose bug 523811 more, I don't know, but as this patch should resolve two fairly significant crashers, I think we can take it for 3.0.1
Comment 22 Blake Winton (:bwinton) (:☕️) 2010-01-10 08:09:45 PST
Landed on 1.9.1 as <http://hg.mozilla.org/releases/comm-1.9.1/rev/cee9ce3fbe35>.
Comment 23 Wayne Mery (:wsmwk, NI for questions) 2010-01-10 17:05:17 PST
virtually non-existent on trunk, 3.0pre and 3.0.1pre (4/month). so can't verify this is gone without a testcase.  But at ~30 crashes/day for 3.0.0 we should know if it's gone within ~48 hours after 3.0.1 ships.

#15 crash for 3.0.0

Note You need to log in before you can comment on or make changes to this bug.