Closed Bug 695309 Opened 13 years ago Closed 13 years ago

Thunderbird sometimes marks entire newsgroups as unread (TB generates NNTP requests with bad formatting or in wrong order, so access to news server fails repeatedly)

Categories

(MailNews Core :: Networking: NNTP, defect)

defect
Not set
critical

Tracking

(thunderbird9-, thunderbird10+ fixed, thunderbird11 fixed, seamonkey2.6 wontfix, seamonkey2.7 fixed, seamonkey2.8 fixed)

RESOLVED FIXED
Thunderbird 12.0
Tracking Status
thunderbird9 - ---
thunderbird10 + fixed
thunderbird11 --- fixed
seamonkey2.6 --- wontfix
seamonkey2.7 --- fixed
seamonkey2.8 --- fixed

People

(Reporter: bhearsum, Assigned: jcranmer)

References

Details

(Keywords: regression)

Attachments

(1 file, 1 obsolete file)

I've been experiencing this for the past couple of months...It always happens when I click on a newsgroup to read some new messages -- suddenly the entire newsgroup will be marked as unread (4000+ messages). Running Thunderbird 8.0 on Ubuntu 11.04. I don't see anything relevant in the error console.
Is there anything else I can look at to try and debug this?
Can you look at the newsrc file for the news server and (it will be in a News sub-directory of your profile dir and will have a name of the form <hostname>.rc) and say what the line for the newsgroup that was marked unread looks like?
Here's the full contents of that file: mozilla.dev.builds: 1-4579 mozilla.dev.planning: 1-17952 mozilla.dev.platform: 1-14462 mozilla.dev.tree-management: 1-11843 mozilla.governance: 1-3062 When I woke my laptop up this morning, Thunderbird hit the issue described for the builds and planning newsgroups. The other three weren't affected.
What size are these local folders on disk ?
My "News folder is 6.9MB, is that what you're looking for?
(In reply to Ben Hearsum [:bhearsum] from comment #3) > Here's the full contents of that file: > mozilla.dev.builds: 1-4579 > mozilla.dev.planning: 1-17952 > mozilla.dev.platform: 1-14462 > mozilla.dev.tree-management: 1-11843 > mozilla.governance: 1-3062 > > > When I woke my laptop up this morning, Thunderbird hit the issue described > for the builds and planning newsgroups. The other three weren't affected. Those look like the messages should all be read, not unread. The only other place read state is stored is in the .msf files for the newsgroups, though we're supposed to prefer/trust the newsrc file. If you repair one of the folders (folder properties on the newsgroup), does it fix the unread state? Doing so will cause us to redownload all the headers, and any messages you've stored for offline use). I don't think file sizes are relevant here.
(In reply to David :Bienvenu from comment #6) > (In reply to Ben Hearsum [:bhearsum] from comment #3) > > Here's the full contents of that file: > > mozilla.dev.builds: 1-4579 > > mozilla.dev.planning: 1-17952 > > mozilla.dev.platform: 1-14462 > > mozilla.dev.tree-management: 1-11843 > > mozilla.governance: 1-3062 > > > > > > When I woke my laptop up this morning, Thunderbird hit the issue described > > for the builds and planning newsgroups. The other three weren't affected. > > Those look like the messages should all be read, not unread. The only other > place read state is stored is in the .msf files for the newsgroups, though > we're supposed to prefer/trust the newsrc file. If you repair one of the > folders (folder properties on the newsgroup), does it fix the unread state? I'm not 100% sure what "fix" is meant to mean here. When I repair one of the newsgroups it prompts to download all headers or N headers. Whichever one I choose, it seems to clear the state of the folder, download the requested ones, and mark them all as unread.
OK, doesn't sound like we used the newsrc line at all. Perhaps someone changed the way this works and I'm behind the times. Cc'ing jcranmer...
Seeing this as well. Anything I can do to help finding the cause for it?
I've also started seeing this. My first thought is that we've suddenly decided to throw away the .msf files, but I clearly see the Replied arrows. Interestingly, though, it only seems to happen on two of my Usenet accounts (my most heavily-used ones, though), and it doesn't happen for all of the groups. If I remember, I'll try to cover this with a msgdb + nntp log to see if that gives any insight.
Note that it didn't happen with Thunderbird 7 or before.
When it happens, quitting and restarting TB still shows the newsgroup as entirely unread, until one clicks on the group name, which marks the entire group as read.
Same here (TB 8 on OS X 10.6.8). This behavior was introduced with TB 8 it never happened to me before.
I deleted my newsrc-<news-group-name> file but NOT the respective .msf files and re-subcribed to the groups. While doing so I noticed that TB doesn't actually write any modifications to the newsrc-<news-group-name> file until you close TB.
Ahh sorry, make that newsrc-<news-server-or-account-name> instead of newsrc-<news-group-name>.
Hi, confirming this bug in version 8, everything worked fine in the 7. In my case, not all messages in the group are marked as unread, but "only" few hundered +- 2 years in the history. It is not issue of all groups, but it has some "random" behavior. Sometimes it helps to close the TB and reopen it, sometimes I have to recall <server>.rc file from the backup to restore the right (old) state. This *.rc file is written at the closing time, so workaround can be: 1) save *.rc file when the problem arises 2) close TB 3) replace *.rc with previously saved 4) open TB again and pray (not tested 100%) I'm able to help with solving this issue - we have local NEWS server here, so maybe it can bring some aditional information.
In order to isolate this: has anyone seen this bug in Win7 64bit? It was initially reported as platform:all but I'm not so sure about that. One of the duplicates is reported against 64bit Linux the other against WinXP. The reporter here says his on Ubuntu 11.04 and I personally only experienced the bug on OS X.
Sorry for not having mentioned it before : I'm seeing this on Vista 64 bits.
Seeing it on Win7-64bit.
Seeing this on Mac OS X 10.7.2, happened after upgrading from TB7 to TB8
(In reply to marcel from comment #19) > In order to isolate this: has anyone seen this bug in Win7 64bit? > It was initially reported as platform:all but I'm not so sure about that. > One of the duplicates is reported against 64bit Linux the other against > WinXP. The reporter here says his on Ubuntu 11.04 and I personally only > experienced the bug on OS X. I'm experiencing this on Windows 7 64-bit. As others have noted, this started with TB 8. I have reverted to TB 7 as this problem makes following newsgroups difficult at best.
I'm adding my name to this problem as well. I am on Mageia1 Linux 64bits.
Ok, experienced it on Win7 64bit, too. I also noticed that the content of the .rc files is odd: <news-group-name>: 1-216,218-573,575-597,599-693 <news-group-name>: 1-7,10-355,357-758 Why would there be gaps in the range although I said "Mark Newsgroup Read"? Note that win TB 7 there's just a single range for each group from 1 to <index-of-last-message>.
I am new to this process, just want to point out how severe this is. I have been using Thunderbird for years now and very happy with it, but this bug makes it completely unusable. I (and several other people at my work place) are faced with finding another newsreader, because this bug is a complete show stopper, and from the status above, looks like no one even assigned to it?? The "normal" classification certainly doesn't fit what I see - for me this is complete loss of function, dead in the water.
Given all of the reports of this, I'm going to raise the severity. I realize Thunderbird devs are very busy already, but comment #26 is right -- this bug makes Newsgroups unusable in Thunderbird.
Severity: normal → critical
I'm also surprised that this has not been given more of a priority. I am involved with the LibreOffice project and a number of us (active members) have also discussed this on the LibreOffice mailing lists. It is not really good advertisement for TB's use as a Newsgroup reader. I have switched to the LibreOffice mailing list rather than staying on their Gmane news and hoping the fix will come in soon. I have also used TB as my main Newsgroup reader for years as a work tool. I am no longer using it as my newsreader as it is too frustrating to use at this point. I may have be forced to switch to another reader soon unless the fix comes in. Ben -- thanks for raising the level and nudging the devs.
If you want to help fix this bug, providing an NNTP/msgdb log of TB when this happens would be most useful, or, alternatively, a more reliable step to reproduce than "occasionally this happens."
Answer to comment #29: 1. There's absolutely nothing that's being written to the console on my system concerning this bug. 2. How many changes went into the TB's NNTP subsystem between TB7 and TB8? Since that subsystem appeared to be close to death for long I suspect there aren't that many? With that list at hand I suspect it should be rather easy to identify a few suspects?
The logs would need to come from <https://wiki.mozilla.org/MailNews:Logging>, not the error console.
Logged ntp:5,MSGDB:5,timestamp as requested. I don't know a way to reproduce the issue on demand, therefore the log file is huge. Searching for "ERROR" in this file I can find lots of sequences like the following one (but there are much more such sequences than the number of times the problem seems to happen, so not sure this is relevant): 2011-12-03 08:31:01.848000 UTC - 0[2730140]: (5564f20) ClosingConnection 2011-12-03 08:31:01.848000 UTC - 0[2730140]: (5564f20) Sending: QUIT 2011-12-03 08:31:01.848000 UTC - 0[2730140]: (5564f20) ClosingSocket() 2011-12-03 08:31:01.848000 UTC - 0[2730140]: (5564f20) CleanupAfterRunningUrl() 2011-12-03 08:31:01.848000 UTC - 0[2730140]: (5564f20) setting busy to 0 2011-12-03 08:31:01.848000 UTC - 0[2730140]: (5565930) setting busy to 0 2011-12-03 08:31:01.848000 UTC - 0[2730140]: (5565930) creating 2011-12-03 08:31:01.848000 UTC - 0[2730140]: (5565930) initializing, so unset m_currentGroup 2011-12-03 08:31:01.848000 UTC - 0[2730140]: (5565930) setting busy to 1 2011-12-03 08:31:01.848000 UTC - 0[2730140]: (5565930) ParseURL 2011-12-03 08:31:01.848000 UTC - 0[2730140]: (5565930) opening connection to news.free.fr on port 119 2011-12-03 08:31:01.848000 UTC - 0[2730140]: (5565930) setting busy to 1 2011-12-03 08:31:01.848000 UTC - 0[2730140]: (5565930) ParseURL 2011-12-03 08:31:01.848000 UTC - 0[2730140]: (5565930) m_messageID = 2011-12-03 08:31:01.848000 UTC - 0[2730140]: (5565930) group = comp.soft-sys.math.scilab 2011-12-03 08:31:01.848000 UTC - 0[2730140]: (5565930) m_key = -1 2011-12-03 08:31:01.848000 UTC - 0[2730140]: (5565090) ClosingConnection 2011-12-03 08:31:01.848000 UTC - 0[2730140]: (5565090) Sending: QUIT 2011-12-03 08:31:01.848000 UTC - 0[2730140]: (5565090) ClosingSocket() 2011-12-03 08:31:01.848000 UTC - 0[2730140]: (5565090) CleanupAfterRunningUrl() 2011-12-03 08:31:01.848000 UTC - 0[2730140]: (5565090) setting busy to 0 2011-12-03 08:31:01.848000 UTC - 0[2730140]: (5565aa0) setting busy to 0 2011-12-03 08:31:01.848000 UTC - 0[2730140]: (5565aa0) creating 2011-12-03 08:31:01.848000 UTC - 0[2730140]: (5565aa0) initializing, so unset m_currentGroup 2011-12-03 08:31:01.848000 UTC - 0[2730140]: (5565aa0) setting busy to 1 2011-12-03 08:31:01.848000 UTC - 0[2730140]: (5565aa0) ParseURL 2011-12-03 08:31:01.848000 UTC - 0[2730140]: (5565aa0) opening connection to news.free.fr on port 119 2011-12-03 08:31:01.848000 UTC - 0[2730140]: (5565aa0) setting busy to 1 2011-12-03 08:31:01.848000 UTC - 0[2730140]: (5565aa0) ParseURL 2011-12-03 08:31:01.848000 UTC - 0[2730140]: (5565aa0) m_messageID = 2011-12-03 08:31:01.848000 UTC - 0[2730140]: (5565aa0) group = comp.lang.tcl 2011-12-03 08:31:01.848000 UTC - 0[2730140]: (5565aa0) m_key = -1 2011-12-03 08:31:01.848000 UTC - 0[2730140]: (5564f20) ClosingSocket() 2011-12-03 08:31:01.848000 UTC - 0[2730140]: (5564f20) CleanupAfterRunningUrl() 2011-12-03 08:31:01.848000 UTC - 0[2730140]: (5564f20) setting busy to 0 2011-12-03 08:31:01.848000 UTC - 0[2730140]: (5564f20) destroying 2011-12-03 08:31:01.848000 UTC - 0[2730140]: (5565090) ClosingSocket() 2011-12-03 08:31:01.848000 UTC - 0[2730140]: (5565090) CleanupAfterRunningUrl() 2011-12-03 08:31:01.848000 UTC - 0[2730140]: (5565090) setting busy to 0 2011-12-03 08:31:01.848000 UTC - 0[2730140]: (5565090) destroying 2011-12-03 08:31:22.846000 UTC - 0[2730140]: (5565930) Next state: NNTP_RESPONSE 2011-12-03 08:31:22.846000 UTC - 0[2730140]: (5565930) Receiving: 400 Cannot connect to NNTP server 212.27.60.38 (212.27.60.38:119), connect error 10060 2011-12-03 08:31:22.846000 UTC - 0[2730140]: (5565930) Next state: NNTP_LOGIN_RESPONSE 2011-12-03 08:31:22.846000 UTC - 0[2730140]: (5565930) Next state: NNTP_ERROR 2011-12-03 08:31:22.846000 UTC - 0[2730140]: (5565930) ClosingConnection 2011-12-03 08:31:22.846000 UTC - 0[2730140]: (5565930) Sending: QUIT 2011-12-03 08:31:22.846000 UTC - 0[2730140]: (5565930) ClosingSocket() 2011-12-03 08:31:22.846000 UTC - 0[2730140]: (5565930) CleanupAfterRunningUrl() 2011-12-03 08:31:22.846000 UTC - 0[2730140]: (5565930) setting busy to 0 2011-12-03 08:31:22.846000 UTC - 0[2730140]: (5565930) ClosingSocket() 2011-12-03 08:31:22.846000 UTC - 0[2730140]: (5565930) CleanupAfterRunningUrl() 2011-12-03 08:31:22.846000 UTC - 0[2730140]: (5565930) setting busy to 0 2011-12-03 08:31:22.846000 UTC - 0[2730140]: (5565930) destroying 2011-12-03 08:31:22.861000 UTC - 0[2730140]: (5565aa0) Next state: NNTP_RESPONSE 2011-12-03 08:31:22.861000 UTC - 0[2730140]: (5565aa0) Receiving: 400 Cannot connect to NNTP server 212.27.60.38 (212.27.60.38:119), connect error 10060 2011-12-03 08:31:22.861000 UTC - 0[2730140]: (5565aa0) Next state: NNTP_LOGIN_RESPONSE 2011-12-03 08:31:22.861000 UTC - 0[2730140]: (5565aa0) Next state: NNTP_ERROR 2011-12-03 08:31:22.861000 UTC - 0[2730140]: (5565aa0) ClosingConnection 2011-12-03 08:31:22.861000 UTC - 0[2730140]: (5565aa0) Sending: QUIT 2011-12-03 08:31:22.861000 UTC - 0[2730140]: (5565aa0) ClosingSocket() 2011-12-03 08:31:22.861000 UTC - 0[2730140]: (5565aa0) CleanupAfterRunningUrl() 2011-12-03 08:31:22.861000 UTC - 0[2730140]: (5565aa0) setting busy to 0 2011-12-03 08:31:22.861000 UTC - 0[2730140]: (5565aa0) ClosingSocket() 2011-12-03 08:31:22.861000 UTC - 0[2730140]: (5565aa0) CleanupAfterRunningUrl() 2011-12-03 08:31:22.861000 UTC - 0[2730140]: (5565aa0) setting busy to 0 2011-12-03 08:31:22.861000 UTC - 0[2730140]: (5565aa0) destroying I'm seeing the issue with comp.soft-sys.math.scilab, which is the first ng in the displayed list. It happens also for the second listed group but much less often. Never seen the issue for the third displayed group.
Another run. Initial situation : all groups entirely read. Clicked on the ng title --> entire group marked as unread. From the timestamps I think the relevant extract of the log file is: 2011-12-03 12:42:47.933000 UTC - 0[2530140]: (74e00c0) setting busy to 1 2011-12-03 12:42:47.933000 UTC - 0[2530140]: (74e00c0) ParseURL 2011-12-03 12:42:47.933000 UTC - 0[2530140]: (74e00c0) setting busy to 1 2011-12-03 12:42:47.933000 UTC - 0[2530140]: (74e00c0) ParseURL 2011-12-03 12:42:47.933000 UTC - 0[2530140]: (74e00c0) m_messageID = 2011-12-03 12:42:47.933000 UTC - 0[2530140]: (74e00c0) group = comp.soft-sys.math.scilab 2011-12-03 12:42:47.933000 UTC - 0[2530140]: (74e00c0) m_key = -1 2011-12-03 12:42:47.933000 UTC - 0[2530140]: (74e00c0) Next state: SEND_FIRST_NNTP_COMMAND 2011-12-03 12:42:47.933000 UTC - 0[2530140]: (74e00c0) Sending: GROUP comp.soft-sys.math.scilab 2011-12-03 12:42:47.996000 UTC - 0[2530140]: (74e00c0) Next state: NNTP_RESPONSE 2011-12-03 12:42:47.996000 UTC - 0[2530140]: (74e00c0) Receiving: 211 501 18675 19175 comp.soft-sys.math.scilab 2011-12-03 12:42:47.996000 UTC - 0[2530140]: (74e00c0) Next state: SEND_FIRST_NNTP_COMMAND_RESPONSE 2011-12-03 12:42:47.996000 UTC - 0[2530140]: (74e00c0) Next state: SETUP_NEWS_STREAM 2011-12-03 12:42:47.996000 UTC - 0[2530140]: (74e00c0) Next state: NNTP_XOVER_BEGIN 2011-12-03 12:42:47.996000 UTC - 0[2530140]: (74e00c0) SetCurrentGroup to comp.soft-sys.math.scilab 2011-12-03 12:42:47.996000 UTC - 0[2530140]: (74e00c0) Next state: NNTP_FIGURE_NEXT_CHUNK 2011-12-03 12:42:47.996000 UTC - 0[2530140]: (74e00c0) Next state: NEWS_PROCESS_XOVER 2011-12-03 12:42:48.011000 UTC - 0[2530140]: (74e00c0) Next state: NEWS_DONE 2011-12-03 12:42:48.011000 UTC - 0[2530140]: (74e00c0) Next state: NEWS_FREE 2011-12-03 12:42:48.011000 UTC - 0[2530140]: (74e00c0) CleanupAfterRunningUrl() 2011-12-03 12:42:48.011000 UTC - 0[2530140]: (74e00c0) setting busy to 0
Hi, I have full log from TB NNTP, tcpdumped trafic, saved state of profile files and recorded user action for that states. All of it have around 60 MB, should I upload it here or send only a link to download that in some archive? Hope it will help, Fosfor PS: it seams to be triggered by connection lost. After that I have everytimes some "unread" messages.
Wierd is this part of the tcpdump: GROUP bazar.hardware.nabidka 200 localnews.sh.cvut.cz InterNetNews NNRP server INN 2.4.3 ready (posting ok). 211 1607 199150 200919 bazar.hardware.nabidka GROUP info.skola.fa 211 13 3088 3100 info.skola.fa XOVER 199150-200919 224 199150-200919 fields follow . Shouldn't XOVER use water marks from range given by 211 response to GROUP command? Diference between these wierd 199150-200919 numbers are 1770, which is the number of "unread" messages reported by TB for the group info.skola.fa.
From what I can tell from the info I have available, which isn't very much, it seems that Thunderbird is *not* using the read state from the news rc file. But when I tried it here, by deleting my .msf file for a newsgroup but leaving the newsrc file alone, TB did get the read state from the newsrc file, and after downloading all the headers, showed the newsgroup is read. Ben, when this happens to you, do you see Thunderbird re-downloading all the messages from the server again, for the particular group? Or does it just show them all as unread? If the latter, I would suspect that Thunderbird was unable to read the newsrc file/read set for particular group for some reason, and thus thought all the messages were unread. What's weird is that no developer is seeing this, which makes me suspect something specific to some users' profiles/system. It does not seem to be server-dependent, if Ben is seeing it with the mozilla news server, which is the news server devs use most.
I reported this a while ago - and saw it happen several times to the same newsgroup, and to many newsgroups. But it has stopped happening to me. Forgive me for not being sure if it has ever happened with the 9 beta I am now running, but I haven't seen it in a while.
Number of "unread" messages misreported is not equal with the group size (at least not in all cases). I my test I'was geting 2 numbers of "unread" messages for different groups with different post counts.
Some points from my experimentation: 1. Putting a laptop to sleep with Thunderbird open is either part of the cause of this issue or a factor which makes it much more prevalent (at least for me). 2. This bug doesn't appear to be *caused* by a bad newsrc, but it can result in a bad newsrc. 3. Restarting Thunderbird immediately after the bug occurs may fix that occurrence of the bug (i.e., restore your original read states). If you wait enough time (>10min, I think), it ceases to be able to fix it.
(In reply to Joshua Cranmer [:jcranmer] from comment #39) > 1. Putting a laptop to sleep with Thunderbird open is either part of the > cause of this issue or a factor which makes it much more prevalent (at least > for me). IMHO the real cause is network connection loss (OS sleep, FW blocked, tunnel closed (my case),...). Is there any special task which is done when connection is lost except the "Could not connect..." message? This can be the source of problems.
(In reply to Fosfor from comment #38) > Number of "unread" messages misreported is not equal with the group size (at > least not in all cases). I my test I'was geting 2 numbers of "unread" > messages for different groups with different post counts. I can confirm this (see comment #25).
(In reply to Fosfor from comment #40) > IMHO the real cause is network connection loss (OS sleep, FW blocked, tunnel > closed (my case),...). Is there any special task which is done when > connection is lost except the "Could not connect..." message? This can be > the source of problems. I tried this out and confirm it is what happens for me: in my case OS sleep does it. If I close thunderbird before sleep and open it again after wake I don't get the problem. I use sleep regularly, and it has always worked fine until this latest version of Thunderbird.
As a data point I'm seeing this on my work computer that I never shut down and where Thunderbird is running 24/7.
I'm now running TB 9 beta, did not suspend my laptop, and still had it happen on a newsgroup where I expected to have only a couple of unread messages. TB had been running in the background for some time. Whe I returned to it, it seemed to think deeply about what to show, then suddenly had lots of unreads in the group. Not all messages were unread, just a (very) large number in that group. Other groups were unaffected. My laptop was connected to the wireless lan at that time. I did not notice network interruptions at the time.
I erroneously reported above the I wasn't seeing this with 9. It just happened again with 9. For what it's worth, I had been at a site which I suspect was blocking nntp traffic - I could no connect to both nntp servers I use. Once back to my home, one of the servers gave me the everything unread symptom on one group, and claimed it needed to download thousands of headers on another group. Nothing was actually downloaded.
Some negative info. I had another occurrence this morning - another group at first update after the above failure to connect. I copied the .rc, the msf and dat, and the hostinfo.dat from my overnight backup and started thunderbird again - and did NOT see the failure. What other state could be involved?
When TB is started with no connection and connection is resumed in the time TB is opened the bug is triggered too.
FWIW, I'm seeing this problem on a 32-bit Linux laptop running Lucid, installed from the PPA: $ dpkg-query show thunderbird 8.0+build1-0ubuntu0.10.04.1~mts2
We don't need any more "me too" comments on this bug: we already know this bug exists, and we are currently trying to track down the causes. Some brief updates from more accidental testing: 1. newsrcLine internally confirmed to be reset prior to the newsgroup becoming marked all read. [Thus, the internal readset is what is getting corrupted.] 2. Some newsgroups don't get reset to just "1": I saw 1-571 as the reset value for a newsgroup (only ~400 messages with high water around 5700). 3. Inspection of nsNNTPNewsgroupList seems to suggest that the best way to trash the read set is to call nsMsgKeySet::SetLastPossible with a low value (like 1)--everything above that value gets deleted. If the highwater mark of the newsgroup got corrupted, that would almost certainly be what's causing this issue. If the current hypothesis is indeed correct, the following should verify it: var Ci=Components.interfaces; var folder=Components.classes["@mozilla.org/rdf/rdf-service;1"].getService(Ci.nsIRDFService).GetResource("news://news.mozilla.org/mozilla.dev.apps.thunderbird").QueryInterface(Ci.nsIMsgNewsFolder); folder.newsrcLine + " [high water]:" + folder.QueryInterface(Ci.nsIMsgFolder).msgDatabase.dBFolderInfo.highWater
Re: "me too" -- I hadn't seen any other mention of 32-bit Linux as an affected platform. One thing to investigate: when I see this issue show up, but the affected group is not *entirely* unread, there seems to be some correlation with groups where I have "killed" threads (using the 'K' key, as for spam); these groups also have "holes" in the newsrc file for the message IDs masked by the "kill". I'm not 100% confident of this, however.
(In reply to Joshua Cranmer [:jcranmer] from comment #49) > If the current hypothesis is indeed correct, the following should verify it: > var Ci=Components.interfaces; var > folder=Components.classes["@mozilla.org/rdf/rdf-service;1"].getService(Ci. > nsIRDFService).GetResource("news://news.mozilla.org/mozilla.dev.apps. > thunderbird").QueryInterface(Ci.nsIMsgNewsFolder); folder.newsrcLine + " > [high water]:" + > folder.QueryInterface(Ci.nsIMsgFolder).msgDatabase.dBFolderInfo.highWater I just ran this code (using mozilla.dev.platform instead of the thunderbird one) and got the following: mozilla.dev.platform: 1-13006,15145 [high water]:15148 I ran it directly after clicking that newsgroup and hitting the symptoms in comment #0.
Status update: mozilla.mozillians: 1 [high water]:0 Judging from where the highwater mark may be set, this confirms (barring something more drastic like more total memory corruption) that bad data is being passed in, specifically 0.
FWIW - I just noticed that all articles newer than the currently selected was marked as read. I have earlier on occasion seen that not all articles were changed to unread. This time I had an older article selected when my laptop took a nap.
may be this bug related to the "compression" option ? TB sometimes ask me if I want to compress the forum posts, I answer yes, then the message list is empty ! I have go to an other folder and back to the forum to see the messages not marked as unread yet...but it happends a little later.
(In reply to Paul TOH from comment #54) > may be this bug related to the "compression" option ? > > TB sometimes ask me if I want to compress the forum posts, I answer yes, > then the message list is empty ! I have go to an other folder and back to > the forum to see the messages not marked as unread yet...but it happends a > little later. I never compress folders (at least as far as I know) and this happens to me too.
Today I've unread messages in an IMAP account for the first time. It is account with very little trafic, no activity in last month here. 92 of 93 messages unread in sent-folder (only the oldest one from 3.11.2007 is read) and both two messages in trash unread (both more then 2 years old). Can it be the same issue or is it some flaw in the IMAP server?
Wireshark has confirmed the actual cause of this bug. The following is a TCP stream log: GROUP mozilla.dev.apps.calendar 200 news.mozilla.org GROUP mozilla.dev.builds 211 5641 2 5642 mozilla.dev.apps.calendar XOVER 5554-5642 211 4651 2 4652 mozilla.dev.builds HEAD 5554 224 xover information follows 4652.FF some mozconfig options.xunxun <xunxun1982@gmail.com>.Sun, 18 Dec 2011 22:43:42 +0800.<mailman.2054.1324219435.31724.dev-builds@lists.mozilla.org>..3336.22.Xref: number.nntp.dca.giganews.com mozilla.dev.builds:4652 . HEAD 5555 423 no such article in group HEAD 5556 423 no such article in group In this exchange, the server logon "200 news.mozilla.org" is being treated as the response to the GROUP command; when sscanf tries to extract a number from news.mozilla.org, it fails and sets the high-water mark to 0. The server then moves on to the next group, and again reads the wrong data. Then, when the server downloads again for the group, it runs the command again and finds the correct result, which results in everything being marked as read.
Is anybody chewing on this, in some forum not obviously linked from here? I can understand how the pace of work would drop through the floor for Xmas break, but it makes me worried that the bug is still unassigned...
Assignee: nobody → Pidgeot18
Status: NEW → ASSIGNED
As I have attached a wireshark protocol to the duplicate of this bug, shall I attach it here again or is it enougth to say, that it exists :-)
See bugs 702038 and 437930 for related symptoms (at least Pidgeot18 seems to believe that 702038 comes from the same cause and therefore is a dup of this).
Component: Folder and Message Lists → Networking: NNTP
Product: Thunderbird → MailNews Core
QA Contact: folders-message-lists → networking.nntp
So, here is a snippet of the NNTP log that causes this failure, annotated with how they get explained: <call nsNNTPProtocol's constructor> <line 329: SetIsBusy(false)> ---> setting busy to 0 <line 533: m_nntpServer->PrepareForNextUrl(this)> <-- oops! <thread through to Initialize> ---> setting busy to 1 ---> ParseURL <No socket open? Open it up!> ---> opening connection to news.eternal-september.org on port 563 <Next, we load it> ---> setting busy to 1 ---> ParseURL ---> m_messageID = ---> group = comp.lang.java.help ---> m_key = -1 <At this point, we actually open up the socket, so m_socketIsOpen is set to true> <Unwind the stack, back to the constructor> ---> creating ---> initializing, so unset m_currentGroup <The constructor exits, go back and load our original URL> ---> setting busy to 1 ---> ParseURL ---> setting busy to 1 ---> ParseURL ---> m_messageID = ---> group = comp.lang.java.programmer ---> m_key = -1 <In this run of LoadUrl, m_socketIsOpen is true, so we set next state to...> ---> Next state: SEND_FIRST_NNTP_COMMAND ---> Sending: GROUP comp.lang.java.programmer ---> Next state: NNTP_RESPONSE ---> Receiving: 200 mx04.eternal-september.org InterNetNews NNRP server INN 2.6.0 (20111104 snapshot) ready (posting ok) <NNTP happily enjoys pipelining, so all of our responses are now exactly 1 behind.> <This being behind causes spurious authentication failures, and misparsing of news values.> <I suspect the recursive initialization also causes the various "not in a newsgroup" error too> By contrast, here is the good one: <call constructor> ---> setting busy to 0 ---> creating ---> initializing, so unset m_currentGroup <exits, etc.> So, the big question: how do I reproduce the error? The answer is simple: create a connection when there is something else in the queue. Normally, this wouldn't be happening, since the entire first load of the connection process happens in the same synchronous timestep, before anyone can load into the queue. However, we need a new event to see that the connection was dropped--so we end up creating a new connection after the queue has been filled.
Attached patch Reliable, reproducible test (obsolete) — — Splinter Review
This is a reliable test that forces the issue in question (it's not a fix yet). Debugging the issue leads to me realize that our queue management for pending urls is full of potential problems that I would like to fix all at once; however, I think I can fix this bug with a simple 1-liner that should be more easily backportable to aurora/beta.
Attached patch Quick fix to the problem — — Splinter Review
This is the quick fix that would eliminate the regression caused by bug 226890 part 8. I'll file another bug for making the URL queues more robust in the face of connection interruptions...
Attachment #587196 - Attachment is obsolete: true
Attachment #587424 - Flags: review?(dbienvenu)
I ran all the xpcshell tests with the patch applied, and the new test failed with an assertion. When I run the test by itself, it passes. I'm a little afraid this is going to introduce sporadic failures. Here's the failure info: TEST-PASS | c:/builds/tbirdhq/objdir-tb/mozilla/_tests/xpcshell/mailnews/news/te st/unit/test_bug695309.js | [test_newMsgs : 49] 8 == 8 2012-01-10 13:42:53 test.test INFO [Context: test.test:1 state: fin ished] Finished test: test_newMsgs 2012-01-10 13:42:53 test.test INFO [Context: test.test:2 state: sta rted] Starting test: trigger_bug TEST-INFO | (xpcshell/head.js) | test 3 pending TEST-INFO | (xpcshell/head.js) | test 3 finished Stopping server! TEST-INFO | (xpcshell/head.js) | test 2 finished FolderLoaded triggered for test.subscribe.empty! ###!!! ASSERTION: unknown error, but don't alert user.: 'errorID != UNKNOWN_ERRO R', file c:/builds/tbirdhq/objdir-tb/mailnews/base/util/../../../../mailnews/bas e/util/nsMsgProtocol.cpp, line 469 xul!nsNNTPProtocol::OnStopRequest+0x0000000000000053 (c:\builds\tbirdhq\mailnews \news\src\nsnntpprotocol.cpp, line 1216) xul!nsInputStreamPump::OnStateStop+0x00000000000000DE (c:\builds\tbirdhq\mozilla \netwerk\base\src\nsinputstreampump.cpp, line 581) xul!nsInputStreamPump::OnInputStreamReady+0x00000000000000A2 (c:\builds\tbirdhq\ mozilla\netwerk\base\src\nsinputstreampump.cpp, line 405) xul!nsInputStreamReadyEvent::Run+0x000000000000004A (c:\builds\tbirdhq\mozilla\x pcom\io\nsstreamutils.cpp, line 115) xul!nsThread::ProcessNextEvent+0x00000000000003A2 (c:\builds\tbirdhq\mozilla\xpc om\threads\nsthread.cpp, line 660) xul!NS_InvokeByIndex_P+0x0000000000000027 (c:\builds\tbirdhq\mozilla\xpcom\refle ct\xptcall\src\md\win32\xptcinvoke.cpp, line 103) xul!CallMethodHelper::Invoke+0x000000000000005B (c:\builds\tbirdhq\mozilla\js\xp connect\src\xpcwrappednative.cpp, line 2899) xul!CallMethodHelper::Call+0x00000000000000CF (c:\builds\tbirdhq\mozilla\js\xpco nnect\src\xpcwrappednative.cpp, line 2230) xul!XPCWrappedNative::CallMethod+0x00000000000001D4 (c:\builds\tbirdhq\mozilla\j s\xpconnect\src\xpcwrappednative.cpp, line 2196) xul!XPC_WN_CallMethod+0x000000000000025F (c:\builds\tbirdhq\mozilla\js\xpconnect \src\xpcwrappednativejsops.cpp, line 1540) ### ERROR: SymGetModuleInfo64: T0x0000000000373D66 ### ERROR: SymGetModuleInfo64: T0x00000000FFFDE000 ### ERROR: SymGetModuleInfo64: T0x00000000FFFFFF87 ### ERROR: SymGetModuleInfo64: T0x0000000003760100 ### ERROR: SymGetModuleInfo64: T <<<<<<< PROCESS-CRASH | c:\builds\tbirdhq\objdir-tb\mozilla\_tests\xpcshell\mailnews\new But otherwise, the patch seems to behave well. I'm just checking that the test fails w/o the patch now.
ugh, the test fails w/o the patch, but only because ODA asserts that no data was read. So I need to try this in a release build, and my release build is crashing on startup. So it's going to take me a while to get this all sorted out.
Comment on attachment 587424 [details] [diff] [review] Quick fix to the problem ok, I can't reproduce the test failure - we'll have to see if it shows up on tinderbox (but it shouldn't, since those are release builds)
Attachment #587424 - Flags: review?(dbienvenu) → review+
(In reply to David :Bienvenu from comment #66) > ugh, the test fails w/o the patch, but only because ODA asserts that no data > was read. So I need to try this in a release build, and my release build is > crashing on startup. So it's going to take me a while to get this all sorted > out. Based on when I last looked at it, there is a failure hidden right before the assertion (indeed, the failure throwing past the async driver is probably causing the assertion). The other assertion/test failure both bienvenu and I failed to reproduce again. I am aware that the test relies on several assumptions about how the internal connection logic is laid out, so it could be that one of my assumptions isn't exactly true all of the time except on my computer. That said, I suspect the failure is solely a test issue; the underlying fix should bring the problem back to as it existed before bug 226980 part 8. With that in mind, the patch has been checked in: http://hg.mozilla.org/comm-central/rev/0b709763595d
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 12.0
I'm not quite familiar with bug-removing process, but this: Target Milestone: Thunderbird 12.0 means it will be working (for ordinary mortal) after one and a half year? (3 half-year releases)?
(In reply to Fosfor from comment #69) > I'm not quite familiar with bug-removing process, but this: > Target Milestone: Thunderbird 12.0 > means it will be working (for ordinary mortal) after one and a half year? (3 > half-year releases)? The patch has landed on the current Thunderbird trunk (or Thunderbird 12); Thunderbird 12 itself should be released around May 1 if I have my schedules correct (12 weeks after the next uplift, on January 31). That said, I am planning on requesting that this be backported to all of the branches given the severity of the bug; thus, most users should see this fixed by around January 31 at the latest.
By coincidence I just found this blog post from :jcranmer concerning this bug: http://quetzalcoatal.blogspot.com/2012/01/how-bugs-get-fixed.html Interesting...and thanks for all your efforts. I also seriously hope this fix will be available with TB10.
OK, tnx you both for explanation and (above all) tnx Joshua for fixind this issue :)
Attachment #587424 - Flags: approval-comm-beta?
Attachment #587424 - Flags: approval-comm-aurora?
Attachment #587424 - Flags: approval-comm-beta?
Attachment #587424 - Flags: approval-comm-beta+
Attachment #587424 - Flags: approval-comm-aurora?
Attachment #587424 - Flags: approval-comm-aurora+
Running SeaMonkey 2.7b4 Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:10.0) Gecko/20120119 Firefox/10.0 SeaMonkey/2.7 Two strange things so far. When I initially started 2.7b4, my third news account, changed its name from the former value of news.individual.net (or just "individual.net") to rbg_sm+graysmail.comIndividual.netrbg_sm+graysmail.com. That's pretty ugly. Haven't tried to edit it back yet. I went through the account server settings and re-enabled check at startup and periodic checking. When I did a Get Msgs on Individual.net, went trough yet another round of user/password prompting and then all messages were new again. It could very well be that I fell victim to the phasing and supplied the wrong info for the wrong account. I'm pretty sure I entered incorrect info for Individual.net and got re-prompted, so I may very well have precipitated the unread event. Otherwise, so far so good...
(In reply to Rich Gray (:rbgray) from comment #78) > Running SeaMonkey 2.7b4 > Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:10.0) Gecko/20120119 > Firefox/10.0 SeaMonkey/2.7 > > Two strange things so far. When I initially started 2.7b4, my third news > account, changed its name from the former value of news.individual.net (or > just "individual.net") to > rbg_sm+graysmail.comIndividual.netrbg_sm+graysmail.com. That's pretty ugly. > Haven't tried to edit it back yet. > > I went through the account server settings and re-enabled check at startup > and periodic checking. When I did a Get Msgs on Individual.net, went trough > yet another round of user/password prompting and then all messages were new > again. It could very well be that I fell victim to the phasing and supplied > the wrong info for the wrong account. I'm pretty sure I entered incorrect > info for Individual.net and got re-prompted, so I may very well have > precipitated the unread event. Otherwise, so far so good... Hm. According to comment #74, the bug ought to have been fixed on that code branch three and a half days before. If you can confirm with certainty that you still get the bug in this build, then it would seem that the fix wasn't perfect.
(In reply to Tony Mechelynck [:tonymec] from comment #80) >>[glitch running SM2.7b4 - patch landed] > Hm. According to comment #74, the bug ought to have been fixed on that code > branch three and a half days before. If you can confirm with certainty that > you still get the bug in this build, then it would seem that the fix wasn't > perfect. I think I may have precipitated the glitch when I incorrectly entered my credentials. Smooth sailing since. I suppose I should do some starts & stops and intentionally clear and reenter credentials to test it out. Will try to do so this weekend.
Summary: Thunderbird sometimes marks entire newsgroups as unread → Thunderbird sometimes marks entire newsgroups as unread (TB generates NNTP requests with bad formatting or in wrong order, so access to news server fails repeatedly)
(In reply to Rich Gray (:rbgray) from comment #78) > When I initially started 2.7b4, my third news account, changed its name from the former value of news.individual.net (or just "individual.net") to > rbg_sm+graysmail.comIndividual.netrbg_sm+graysmail.com. Who changed? What kind of name at where was changed? If you changed Server Settings/Server Name:, it's saved in realhostname and is used for server access. If this setting is changed, all news articles is downloaded again even when difference is server name only(all newsgroups are common), because abosolutely different news server. hostname is not changed once defined for any serverX.type. And password manager's key=hostname instead of realhostname and I think it's applicable to News too. Please be careful in understanding phenomenon if userid/password is relevant and you changed Server Name: setting. Following is my news account definitions in my Tb profile for testing. > user_pref("mail.server.server5.name", "news1 on news.mozilla.org"); > user_pref("mail.server.server5.type", "nntp"); > user_pref("mail.server.server5.hostname", "x.x.x"); > user_pref("mail.server.server5.realhostname", "news.mozilla.org"); > user_pref("mail.server.server5.newsrc.file", "C:\\ ... \\News\\x.x.x.rc"); > > user_pref("mail.server.server7.name", "news.opera.com"); > user_pref("mail.server.server7.type", "nntp"); > user_pref("mail.server.server7.hostname", "news.mozilla.org"); > user_pref("mail.server.server7.realhostname", "news.opera.com"); > user_pref("mail.server.server7.newsrc.file", "C:\\ ... \\News\\news.mozilla.org.rc"); Because of above setting, I frequently try to look into \news.mozilla.org.rc when I want to see .rc file content for news.mozilla.org :-)
(In reply to WADA from comment #82) > (In reply to Rich Gray (:rbgray) from comment #78) > > When I initially started 2.7b4, my third news account, changed its name from the former value of news.individual.net (or just "individual.net") to > > rbg_sm+graysmail.comIndividual.netrbg_sm+graysmail.com. > > Who changed? What kind of name at where was changed? I have just seen something related. The names of the last 6 of my news accounts are changed by inserting multiple times the string 'hafi'. hafi@i5_64 ~/.mozilla/seamonkey/yjmshkey.default $ grep 'name", "hafi' prefs.js user_pref("mail.server.server12.name", "hafinewshafi1.open-news-hafinetwork.orghafi"); user_pref("mail.server.server13.name", "hafireadhafi.news.telefohafinica.dehafi"); user_pref("mail.server.server14.name", "hafinewshafi.alice-dsl.dhafiehafi"); user_pref("mail.server.server5.name", "hafifreehafinews.netfronhafit.nethafi"); user_pref("mail.server.server8.name", "hafinewshafi.tota-refugihafium.dehafi"); user_pref("mail.server.server9.name", "hafinewshafi.online.dehafi"); Only mail.server.server??.name is changed, all other mail.server.server??.* remain unchanged. The names of prior news servers are not affected. BTW, hafi is my user name.
(In reply to Hartmut Figge from comment #83) > I have just seen something related. The names of the last 6 of my news > accounts are changed by inserting multiple times the string 'hafi'. > > (...) If you rewrite the strings, the issue becomes clearer: hafi news hafi 1.open-news- hafi network.org hafi hafi read hafi .news.telefo hafi nica.de hafi hafi news hafi .alice-dsl.d hafi e hafi hafi free hafi news.netfron hafi t.net hafi hafi news hafi .tota-refugi hafi um.dehafi hafi news hafi .online.de hafi So in between "hafi"s there are parts of certain lengths (4, up to 12, rest) of the old account name, and "hafi" is both at the beginning and end of all strings. As I already noted on the newsgroups (somewhere...), I've seen this bug years ago but was never able to reproduce it. Thanks to your list of broken strings (heh) I was able to track it down this time: I'm 99% sure nsMsgIncomingServer::OnUserOrHostNameChanged is the culprit. As the name implies, a username or password change triggers a renaming of the account name (prettyName internally): "replace all occurrences of old name in the acct name with the new one". It seems the code is not always doing the right thing (maybe if oldName or newName are empty?). Do you remember what exactly you did and what you entered? In any case I'm pretty sure the issue you see is not this bug, so please file a new one (Product MailNews Core), CC me and post the bug number here for reference (I'm not a back-end guy so we'll need help there). Thanks!
(In reply to Jens Hatlak (:InvisibleSmiley) from comment #84) > In any case I'm pretty sure the issue you see is not this bug, so please > file a new one (Product MailNews Core), CC me and post the bug number here > for reference (I'm not a back-end guy so we'll need help there). Thanks! Hafi filed bug 720199. Continuing discussion there.
Keywords: regression
Has this fix been backported to 9, and if not will it be? I'd like it to hit Ubuntu 11.10 sometime...
(In reply to Michael from comment #89) > Has this fix been backported to 9, and if not will it be? I'd like it to hit > Ubuntu 11.10 sometime... No, TB 10 is out - we don't patch previous rapid release versions of Thunderbird when newer releases supersede them, except for the ESR releases.
I don't buy that this is fixed. This bug (as described better in bug 702038) STILL exists and renders news.mozilla.org unusable -- in TB 17.0.2!!!
Same for me. This bug is not fixed for me in TB 17.0.2. What I see however is not bug 702038 but bug 693575 (which was declared to be a duplicate of this bug, which is why I'm reporting here).
(In reply to John David Galt from comment #91) > I don't buy that this is fixed. This bug (as described better in bug > 702038) STILL exists and renders news.mozilla.org unusable -- in TB 17.0.2!!! if your bug, as described, is not fixed, then simply undup your bug. no need to comment here. (In reply to fvogelnew1 from comment #92) > Same for me. > This bug is not fixed for me in TB 17.0.2. What I see however is not bug > 702038 but bug 693575 (which was declared to be a duplicate of this bug, > which is why I'm reporting here). In that case, suggest you question the reporter in that bug report. If he/she says it is fixed, then you should file a new bug. Otherwise, the reporter of 693575 should reopen their bug.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: