Closed Bug 536806 Opened 15 years ago Closed 15 years ago

crash [@ nsNntpCacheStreamListener::OnStopRequest(nsIRequest*, nsISupports*, unsigned int)]

Categories

(MailNews Core :: Networking: NNTP, defect)

1.9.1 Branch
x86
Windows Vista
defect
Not set
critical

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 531794

People

(Reporter: wsmwk, Unassigned)

Details

(Keywords: crash)

Crash Data

crash [@ nsNntpCacheStreamListener::OnStopRequest(nsIRequest*, nsISupports*, unsigned int)]

Andreas, are some of these your crashes?
 https://crash-stats.mozilla.com/report/list?version=Thunderbird%3A3.0&query_search=signature&query_type=exact&query=&date=&range_value=2&range_unit=weeks&do_query=&signature=nsNntpCacheStreamListener%3A%3AOnStopRequest%28nsIRequest*%2C%20nsISupports*%2C%20unsigned%20int%29&page=1

related to bug 531794?  Some comments indicate this crash didn't happen in v2.0.x

bp-c8da4361-ecbb-4c34-bcca-b5a192091224
check for mail on nntp server toplevel root element. then clicking around in nntp folders and messages (fully and partly downloaded), message display being lagged lots, .msf file access going crazy, 
0	thunderbird.exe	nsNntpCacheStreamListener::OnStopRequest	 mailnews/news/src/nsNNTPProtocol.cpp:687
1	thunderbird.exe	nsInputStreamPump::OnStateStop	netwerk/base/src/nsInputStreamPump.cpp:576
2	thunderbird.exe	nsInputStreamPump::OnInputStreamReady	netwerk/base/src/nsInputStreamPump.cpp:401
3	xpcom_core.dll	nsOutputStreamReadyEvent::Run	xpcom/io/nsStreamUtils.cpp:111
4	xpcom_core.dll	nsThread::ProcessNextEvent	xpcom/threads/nsThread.cpp:521
dont have much time atm for too many details, but i have done some testing with some simple newsgroups on public newsservers.

i have selected a groups with some tens thousands of available messages and some smaller group with only few hundreds or thousand messages.

i have selected both groups to fully download all the messages/headers and store them locally (sync buttom from file, offline, download/sync now).

also i had some pop3/smtp account and some local/global folder tree structure in tb3.0

upon every starting of tb3.0 it restores the state of these few object trees, whether the newsgroups subtree or local folders were extended or not and so on.

so whenever the newsgroup tree had been extended in the previous tb3 session, and i start/restart tb3 it (newsgroup tree) gets expanded again.

upon every expanded newsgroup subtree, tb3 apparently walks (tcp connection to newsgroup server) through the newsgroup tree and checks the usenetserver whether it has new objects/messages (headers) to download and so on.

during this initial check, i have observed that tb3 goes lunatic on my disk (windows, with msft/sysinternals (filemon(itor)) tool) and goes through these index files .msf or something.

on other bigger productive tb3 setups i have a great deal of subscribed newsgroups, and it takes literally like half a minute or so to process all these .msf files and these initial procedures.

its especially during these .msf-files-busy-time when the user tries to already impatiently navigate through the expanded usenet folders to view already present or still-to-be-downloaded messages (headers downloaded only), when the tb3 client goes erratic. it seems to interpret all kinds of gui actions (mouseclick, selecting messages, selecting folders) in all kinds of wrong ways, especially when you are impatient and click on many messages or folders in terms to bring the contents up.

sometimes it navigates folders and whole trees upwards and jumps back to the local and pop3 subtrees or arrives at the topmost element whatever kind of tree you have there. also sometimes it expands the tb3 menus or "clicks" at elements you never clicked, or arrives in the options gui dialog (rarely but i had all kind of really weird cases).

its also during these times when the crashes arrive, and it also seems to depend if you click many of the header-downloaded-only elements, or the already-completely-downloaded-elements (always speaking about usenet messages).

it seems to differ in the crash results and somehow relates to the onstoprequest and that other ondataavailable.

i have tried to make some sense of it, but couldnt really, as i am not really much of a developer on this level.

i have also experimented in the newgsgroups folders to rightclick and rebuild index of that newsgroup folder, and then quickly clicking on usenet alements again, also leading to these two kinds of crashes.

these two bugs (ondataavailable and onstoprequest) seem somehow related, but maybe from a higher point of view or some other functions and processes obstructing some same set of functions, as there seem to be some buffers of mouse input events that seem to get applied at a much later point of time and crazy other places the user never intended and never did want to.

something else that comes to my mind is, that in these crash repro scenarios i was experimenting with, i also came across a long-time very odd behaviour that i have also observed ever since the thunderbird 2.x days, that sometimes (both usenet messages and local mail messages (pop3)) are being displayed in some odd-style raw-like fashion (i always select the view mode of messages as plain-txt, so no html/simple-html mode at all).

so sometimes in these tb3 scenarios where i try to make tb3 crash with these two kinds of errors, i see messages being displayed in some kind of raw mode (similar to ctrl+u raw view), but not completely like that, cr/lf seems to be missing and the whole usenet/mail message seems to be kinda like in one line, but still beaking displayed in the preview window with multiple lines due to the small size of the preview, but not respecting the actual cr/lf/nextline commands that come with message-headers or user-data.

so anyways, this all reminds me of some bigger problems of maybe even the architecture of the whole gui and threads and the whole app in general, as it often seems that the whole app seems to be malfunctioning because maybe some internal buffers are being worked on by wrong threads or threads/functions dont finish their work on the buffer completely but some other thread or the next function already draws the results in the preview window, and in similar ways these crashes with the nntp part in the current tb3.0 implementation, that whenever this .msf index files walkthrough upon extending the nntp tree is really busy, but you already navigate through your nntp folders and messages and dont actually wait patiently til the .msf-walkthrough-cycle has been finished, and constantly click on other nntp messages or switch to other nntp folder and so on, eventually some buffers inside the client overflow or some functions misinterpret data that was never intended for them thus leading to the crash here.


the easiest way i was able to cause both types of these nntp crashes was to subscribe to either many groups on some nntp server and then clicking "get mail" many times when being on the top element of your nntp tree (name of nntp server/account itself), creating a big .msf-indexing-load and then hurrying a little bit and trying to make the tb3 client display some of the nntp messages and elements and clicking about the various message elements in one or multiple newsgroup subfolders and so forth.

so causing a bigger nntp load on the client seems to mess up some internal buffers or lists of actions and objects eventually leading on probably inappropriate commands and dataprocessing and faulty conditions.

its really not that hard to cause these two types of crashes in the tb3 client at all once you have subscribed to enough nntp groups and download/sync those headers or messages completely and "impatiently" click on your nntp messages and so forth.

also try to fetch/store/sage all the nntp elements in locally to generate some network load and network traffic so the client is being loaded and busy with real work, which will help you making the client choke on data, buffers and threads....


hope this helps a bit. all of this holds true for both of these bugs:
https://bugzilla.mozilla.org/show_bug.cgi?id=536806
https://bugzilla.mozilla.org/show_bug.cgi?id=531794

thanks and cheers.
(In reply to comment #1)
> so whenever the newsgroup tree had been extended in the previous tb3 session,
> and i start/restart tb3 it (newsgroup tree) gets expanded again.

This much is known for a while...

> upon every expanded newsgroup subtree, tb3 apparently walks (tcp connection to
> newsgroup server) through the newsgroup tree and checks the usenetserver
> whether it has new objects/messages (headers) to download and so on.

... ditto.

> during this initial check, i have observed that tb3 goes lunatic on my disk
> (windows, with msft/sysinternals (filemon(itor)) tool) and goes through these
> index files .msf or something.

If you set news.update_unread_on_expand to false, this action will stop, but it will prevent biff from updating as well.

Anyways.

> its also during these times when the crashes arrive, and it also seems to
> depend if you click many of the header-downloaded-only elements, or the
> already-completely-downloaded-elements (always speaking about usenet messages).

It seems to me that the crash is ultimately caused by a massive thread race resulting from trying to do too many offline reads. This may point to a root bug elsewhere, but I'll need more investigation.

> it seems to differ in the crash results and somehow relates to the
> onstoprequest and that other ondataavailable.

All of these bugs relate to the same core problem: mListener is being set null before it should be; I'm not sure if this is data race issues or memory corruption (thread memory visibility?), but it is problematic.

In any case, I need to reliably reproduce it myself before I can make any more judgments. The previous band-aid only managed to be reproduced once, to the aggravation of the fixer and the reviewers of said patch.

> these two bugs (ondataavailable and onstoprequest) seem somehow related, but
> maybe from a higher point of view or some other functions and processes
> obstructing some same set of functions, as there seem to be some buffers of
> mouse input events that seem to get applied at a much later point of time and
> crazy other places the user never intended and never did want to.

[ We do too much synchronous stuff, so a lot of logic runs on the GUI thread that probably really shouldn't be, but fixing that requires massive, massive architecture overhauls. ]

> the easiest way i was able to cause both types of these nntp crashes was to
> subscribe to either many groups on some nntp server and then clicking "get
> mail" many times when being on the top element of your nntp tree (name of nntp
> server/account itself), creating a big .msf-indexing-load and then hurrying a
> little bit and trying to make the tb3 client display some of the nntp messages
> and elements and clicking about the various message elements in one or multiple
> newsgroup subfolders and so forth.

Good to know--I think the offline is also key here.

> hope this helps a bit. all of this holds true for both of these bugs:
> https://bugzilla.mozilla.org/show_bug.cgi?id=536806
> https://bugzilla.mozilla.org/show_bug.cgi?id=531794

They should probably be duped to each other, as they have the same root cause that we keep band-aiding.
my reference about the tree extension upon restart of the tb client was just to describe my scenario and testing settings exactly and to describe how i was looking at things and stuff the software was doing. that part wasnt meant as a bugreport or something :)


anyways, so i further figured (also reported on the other bugreport for ondataavailable) that when using keyboard navigation in the nntp newsgroups folders i can reproduce these two sorts of crashes even more reliably and successfully than when going lunatic with the mouse and clicking.

but beware, its not the keyboard cursor navigation with cursor up/down/left/right keys or something, but its these shortcut keyboard commands that are listed in the "go" menu, for message navigation, the keys "f", "n", "b", "t" and "p".

these keys give me really easy and high rate of crashes and i have described some hints and thoughts about what i think a difference in the onstoprequest and ondataavailble crashes are.

please read my posting in the other bugreport. hope this helps a bit.
regards.
patch has landed in bug 531794. you try a trunk build and comment in bug 531794 on the results. or wait for the patch to land in a 3.0.x build ftp://ftp.mozilla.org/pub/thunderbird/nightly/latest-comm-1.9.1/
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → DUPLICATE
Crash Signature: [@ nsNntpCacheStreamListener::OnStopRequest(nsIRequest*, nsISupports*, unsigned int)]
You need to log in before you can comment on or make changes to this bug.