Open Bug 1003073 Opened 6 years ago Updated 5 years ago

ChatZilla stops working and notes "potential abnormality"

Categories

(Other Applications :: ChatZilla, defect)

defect
Not set

Tracking

(Not tracked)

People

(Reporter: aditsu, Assigned: rginda)

Details

User Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.152 Safari/537.36

Steps to reproduce:

I use ChatZilla to chat on freenode. I usually keep it running on my laptop for long periods of time. This issue happens after doing the following things a few times:
- suspend the laptop to ram and turn it on again later
- lose and regain a wifi connection
- connect to a different ssid
I don't have an exact sequence of events to reproduce the bug, but it has happened many times in the last couple of years.
Current ChatZilla version is 0.9.90.1


Actual results:

ChatZilla appears to be connected, and I can see users in channels, but there is absolutely nothing happening - no chat, no joins and no exits. I can write messages but there is no reply. Commands like /ping or /whois similarly receive no reply. 

1 or 2 days after everything freezes, I get the following message in the current channel:
[WARNING]	ChatZilla has detected a potential abnormality in its internal data. You will not be able to send any form of communication at this time, although it might appear you can. The most likely cause is Mozilla Bug 318419 <https://bugzilla.mozilla.org/show_bug.cgi?id=318419>. You are strongly advised to restart the host application (Firefox) to prevent further problems.

If I try to disconnect from freenode (right-click on the freenode tab, disconnect), nothing happens. If I try a second time, I get:
[ERROR]	Internal error dispatching command “disconnect”.
[ERROR]	Not Connected.

After restarting chatzilla, everything is back to normal.


Expected results:

If ChatZilla is connected, it should work normally. If it's not connected, it should detect the disconnection and attempt to reconnect.
The trigger of the message about bug 318419 is the send queue (outgoing messages) reaching 1000 items, which is highly unusual and typically means that the code to process events has stopped running. Assuming that it is not bug 318419, I can only guess that something is preventing code from running as expected. I'd suggest opening Firefox's Browser Console (Control-Shift-J on Windows) when this next happens and see if there's any messages mentioning "ChatZilla" at all. You don't need to wait until this warning is shown, just check it as soon as you are certain messages are not sending/receiving.
I restarted CZ just a few minutes ago, but I had actually opened firefox and checked the console before, and didn't notice anything about ChatZilla.
Next time it happens I'll check the console as soon as I notice the problem, and report here.
Mozilla/5.0 (X11; Linux x86_64; rv:30.0) Gecko/20100101 Firefox/30.0 SeaMonkey/2.27a1 ID:20140214003001 c-c:1ce77c2f9bd0 m-c:d275eebfae04

ChatZilla 0.9.90.1 nightly 2013103000

I see similar symptoms, except that I've never let cZ run for days on end after noticing that it had stopped responding.

I use cZ-on-FxESR on moznet as tonymec|away and cz-on-SeaMonkey on both moznet and freenode as tonymec (I also use Konqueror as tonymec_KDE). I see the problem in SeaMonkey only but my SeaMonkey instance is much busier: compared to Firefox, 110 browser tabs vs. 30, 84 enabled extensions vs. 33, 5 email accounts vs. (of course) 0, 21 IRC channels on 2 networks vs. 14 channels on 1 network.

Sometimes I notice that cZ-on-Sm is apparently not aware that it has got a ping-timeout from both networks at approximately the same time, as can be ascertained by the last entry logged ("an hour ago", maybe) for high-throughput channels such as #developers on moznet and #defocus on freenode. I get the same inactivity as that described in comment #0 except that I haven't tried keeping the unresponsive cZ open for days on end — normally I restart SeaMonkey every day in order to rotate its logs (I keep logs of its stdout/stderr output).

The one thing I have found which works to reconnect cz-on-Sm is to close its window by means of the [X] button at far upper right (it asks by means of an alert popup if I really want to quit even though I'm "still connected", and I confirm), then launch it again via the cZ button on SeaMonkey's statusbar (to launch cZ on Firefox I could use the "Tools → ChatZilla" menuitem, Alt+T Z, on SeaMonkey the "Window → IRC Chat" menuitem, Alt+W I, on either the cZ button if placed on a toolbar). Of course this workaround assumes that there exists at least one browser or (on SeaMonkey) mail-news window.


Attempt at debugging:
In yesterday's stdout/stderr log, I notice the following lines; they are consecutive but I don't know whether they are relevant:

cz: Can't send to disconnected socket
cz: sendQueue flushed.
JavaScript strict warning: chrome://chatzilla/content/handlers.js, line 2793: reference to undefined property user.chanListEntry
cz: Error routing event channel.quit: *
 in onQuit
TypeError: user.chanListEntry is undefined
cz: my_removeFromList@chrome://chatzilla/content/handlers.js:2793
my_cquit@chrome://chatzilla/content/handlers.js:2883
ep_routeevent@chrome://chatzilla/content/lib/js/events.js:244
serv_quit@chrome://chatzilla/content/lib/js/irc.js:2239
ep_routeevent@chrome://chatzilla/content/lib/js/events.js:244
ep_stepevents@chrome://chatzilla/content/lib/js/events.js:312
mainStep@chrome://chatzilla/content/static.js:1512
@chrome://chatzilla/content/static.js:1514

cz: Error routing event channel.quit: *
 in onQuit
TypeError: user.chanListEntry is undefined
cz: my_removeFromList@chrome://chatzilla/content/handlers.js:2793
my_cquit@chrome://chatzilla/content/handlers.js:2883
ep_routeevent@chrome://chatzilla/content/lib/js/events.js:244
serv_quit@chrome://chatzilla/content/lib/js/irc.js:2239
ep_routeevent@chrome://chatzilla/content/lib/js/events.js:244
ep_stepevents@chrome://chatzilla/content/lib/js/events.js:312
mainStep@chrome://chatzilla/content/static.js:1512
@chrome://chatzilla/content/static.js:1514

cz: Error routing event channel.quit: *
 in onQuit
TypeError: user.chanListEntry is undefined
cz: my_removeFromList@chrome://chatzilla/content/handlers.js:2793
my_cquit@chrome://chatzilla/content/handlers.js:2883
ep_routeevent@chrome://chatzilla/content/lib/js/events.js:244
serv_quit@chrome://chatzilla/content/lib/js/irc.js:2239
ep_routeevent@chrome://chatzilla/content/lib/js/events.js:244
ep_stepevents@chrome://chatzilla/content/lib/js/events.js:312
mainStep@chrome://chatzilla/content/static.js:1512
@chrome://chatzilla/content/static.js:1514
Status: UNCONFIRMED → NEW
Ever confirmed: true
The problem has happened again a couple of times, and I checked the console but couldn't find any messages related to ChatZilla.
Mozilla/5.0 (X11; Linux x86_64; rv:30.0) Gecko/20100101 Firefox/30.0 SeaMonkey/2.27a1 ID:20140214003001 c-c:1ce77c2f9bd0 m-c:d275eebfae04

cZ 0.9.90.1 with /channel-pref tabLabel (on a channel) meaningful and equal to the value of extensions.irc.networks.NETWORK.channels.CHANNEL.tabLabel (e.g. extensions.irc.networks.moznet.channels.%23chatzilla.tabLabel)

I saw this bug today (about 12 hours ago but I noticed when I woke up). Strange that we're warned about something that was fixed in 2006.
P.S. I only restarted cZ so far (not the whole Suite, and by clicking the [x] at far topright in the cZ window frame then clicking the cZ button in the Browser) and I can type stuff again (and have it answered by other people).
(In reply to Tony Mechelynck [:tonymec] from comment #5)
> Strange that we're warned about something that was fixed in 2006.

The message simply indicates that things have broken and, at the time, that bug was the cause. Lots of people were hitting it so we needed to say what it was. Clearly, this time it is unlikely to be that bug.

Some possible options for checking and 'fixing' this are:
 * Shows the current send queue length: /eval e.server.sendQueue.length
 * Shows notify list (indirect cause last time): /network-pref notifyList
 * Turn off away-check (possible indirect cause): /pref autoAwayCap 0
 * Forcibly reset send queue: /eval e.server.flushSendQueue()
 * Forcibly restart send queue: /eval e.network.eventPump.addEvent(new CEvent("server", "senddata", e.server, "onSendData"))

You should check what happens to the send queue length a few times when you get the warning. If the away-check is disabled, the send queue should only get longer when you try to talk or send other commands directly. If away-check is enabled, it will also increase periodically (typically 1 command per 2 minutes) for the away-check.

There are two options to try 'fixing' things when they break; you can reset the send queue, which discards everything queued up and will automatically restart the send queue for the next command sent, or you can directly restart the send queue. Restarting the send queue once the warning message has appeared may flood you out of the network or at least clog up the IRC connection for ages (sending 1 queued message every 1.5s) so you might want to only do this once and then restart CZ anyway - but check the send queue length a few times after doing it, to see if it is actually draining the queue or not.
In reply to comment #7: Thanks a lot, James, for the pointers to possible actions. I'll try them if ever I see the bug agin, they are much less burdensome than closing the whole cZ window.

This is an erratic bug, and I haven't seen it recently, maybe because for other reasons I haven't had long enough busy enough continouous sessions for it to appear. In particular, now that Linux nightlies are being published again for trunk SeaMonkey, I restart at least once a day (after installing the new nightly), and of course oftener if there has been a crash (or a kernel panic or an AC circuit break, etc.) in the meantime. The fact that I have (temporarily?) retired cZ-on-Firefox, and that nowadays I only start Firefox itself for short times to load pages (such as Google Groups, Vox news, etc.) which are not properly rendered by SeaMonkey, thus somewhat reducing net bandwidth and memory swap, might also be a factor.
You need to log in before you can comment on or make changes to this bug.