Open Bug 622484 Opened 15 years ago Updated 5 months ago

Disabled newsgroup filter with custom header condition (user-agent for example) causes download of full article headers

Categories

(MailNews Core :: Filters, defect)

x86
Windows XP
defect
Not set
major

Tracking

(Not tracked)

People

(Reporter: david, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: perf, reproducible)

Attachments

(2 files)

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.16) Gecko/20101123 SeaMonkey/2.0.11 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.13) Gecko/20101207 Thunderbird/3.1.7 If I subscribe to a new (new to me) newsgroup that has many messages on my NSP's NNTP server, downloading the headers takes far too long. This appears to result from running my server-level filters on the downloaded headers. At my server-level, I have 8 named filters with a total of 67 filter conditions. I subscribed to comp.security.misc, which had over 25,000 messages. When I opened comp.security.misc, I indicated to download 500 of the messages. The header counter in the status bar of the Thunderbird window showed that headers were being downloaded in chunks of 100 or more, very quickly. Then the header counter started again at 1, showing headers were being downloaded 1-4 at a time, very slowly. Then the counter started again as if Thunderbird were looping. This problem occurred even when I disabled all 8 filters. The problem did NOT occur when I moved file msgFilterRules.dat from the newsgroup account in the Thunderbird profile directory to a different hard drive. This indicates that filtering occurs even when all filters are disabled. This also indicates that filtering is inefficient when applied to a large number of headers. Reproducible: Always
David, via procmon, can you confirm whether you see bug 571419 in this bug's behavior? If so, 571419 may be a big part for performance hit. > This indicates that filtering occurs even when all filters are disabled. that shouldn't be happening either
Keywords: perf
I'm on Windows XP and do not have anything called procmon. As I recall, that is a UNIX tool; but I might be wrong. I do have Performance Monitor. If that will do, tell me which Performance Object and which counters for that Object you want. Summary changed to delete "on New Subscriptions". Apparently, this is a problem whenever a large number of headers are downloaded while the applicable filter set is large.
Summary: Problem With Newsgroup Filters When Downloading Many Message Headers on New Subscription → Problem With Newsgroup Filters When Downloading Many Message Headers
David, procmon is a small MS tool found at http://technet.microsoft.com/en-us/sysinternals/bb896645
No, I have not yet installed ProcMon; however, I have downloaded it. I hope to get to this sometime within the coming week. Today, when I first opened my news.mozilla.org account in Thunderbird, the mozilla.org newsgroup indicated 62 unread messages. Since I had either read all previous messages in that group or had marked them as already read, I knew these all had to be new message. From a prior activity with the news.mozilla.org server, my Server Settings had 50 for "Ask me before downloading more than nnn messages". Thus, I got a dialogue popup in which I changed the 50 to 500 (my usual setting). Thunderbird immediately downloaded the 62 message headers and then slowly proceeded again to download the headers. However, this time, it downloaded 500 of the 62! Obviously, something worse than merely inefficient filtering is involved.
Okay. I installed ProcMon and logged about 2-3 minutes of file system activity for Thunderbird filtering new headers for a newsgroup. I then saved the log as a PML file. It's over 130 MB. I don't know what you (Wayne Mery) want filtered or how you want me to send the log.
Given that this is a windows machine, (and despite the issue is localized to Thunderbird), before we proceed further I feel obligated to ask if the symptoms reduce if you temporarily disable antivirus and non-windows firewall software? Also before continuing to diagnosis, it would be useful to know if conditions improve using thunderbird version 5 And a further thought - can you reproduce this slowness with similar filter conditions but in a newsgroup with far fewer articles?
Summary: Problem With Newsgroup Filters When Downloading Many Message Headers → Slow performance with many Newsgroup Filter conditions When Downloading Many News Article Headers
I'm another which have discussed the issue with David in the newsgroups and have experienced the same before, but using SeaMonkey instead of Thunderbird. I don't run any anti-virus or firewall software for that matter. The issue is steadily reproducible with a single filter with a single rule which references a header which value could only be fetched with a whole message. The problem is, even when the filter is disabled it continues to cause download of the whole messages. The number of new articles in a group surely affects the overall time spent. The news server seems to matter, also - the news://news.mozilla.org/ is much slower than news://news.eternal-september.org/ for me, for example. I'm not sure whether this issue targets just fixing the problem with disabled filters causing unnecessary download of the whole messages, or how to improve the download of whole messages. I guess the later is not something which could be improved significantly.
Yes, the slowness is proportional to the number of headers. A newsgroup with 10 new headers goes an order of magnitude faster than a new newsgroup with 50,000 headers. The actually downloading is quite fast through my broadband connection. It's the second pass through the headers -- the filtering pass -- that is slow. I suspect there is an inefficiency in how TBird loops through messages and filters. I once coded such an inefficiency myself when I first started programming almost 50 years ago. It involved a loop within a loop. By taking the inner loop and making it the outer loop, a process that would take an hour instead completed in a minute with no real impact on the result. It will be a few days before I have time to try recreating the problem with firewalls and anti-virus disabled. I still have the Logfile.PML file from 10 Feb 11. Does anyone still want it?
(In reply to comment #6) > And a further thought - can you reproduce this slowness with similar filter > conditions but in a newsgroup with far fewer articles? to clarify comment 6 - what I mean is whether the increase in time to process X new headers changes with an increase in Y existing articles. For example - if 50 messages downloaded into a newsgroup of 1,000 messages takes 60 seconds - then 50 messages downloaded into a newsgroup of 10,000 message should NOT take substantially more than 60 seconds.
(In reply to comment #7) > I'm another which have discussed the issue with David in the newsgroups and > have experienced the same before, but using SeaMonkey instead of > Thunderbird. I don't run any anti-virus or firewall software for that > matter. > > The issue is steadily reproducible with a single filter with a single rule > which references a header which value could only be fetched with a whole > message. by whole message, do you mean body of the news article? D.E.R. has mentioned only headers, and not body, so your issue might need a different bug report. Is there anything useful in the newsgroup postings you have mode? xef bug 138208, bug 571419
> by whole message, do you mean body of the news article? I admit I'm just speculating. I don't know much of the NNTP mechanics, but I'm aware the protocol has a request type fetching just few basic headers, and for the rest it has to fetch the full headers, if not the full message including body.
I waited and waited, without connecting to a news server that I often use. I disabled my anti-virus application and both the software (Windows XP) and hardware (Netgear router) firewalls. Then I went to the server. There were too few new messages to test this problem. I will have to repeat the test by subscribing to a really large newsgroup. That will have to wait.
Today, I ran three tests. For Test #1, I disabled all filters by changing the name of the msgFilterRules.dat file; for this test, headers downloaded in seconds. For Test #2, all my filters were enabled. Test #3 repeated Test #2 but with anti-virus software and firewalls disabled. For these two, not all headers downloaded even after many minutes. Detailed descriptions of these tests are in the attached PDF file. The file also describes my configuration.
Whiteboard: [has test data]
(In reply to David E. Ross from comment #5) > I installed ProcMon and logged about 2-3 minutes of file system > activity for Thunderbird filtering new headers for a newsgroup. > I then saved the log as a PML file. If PML file, Performance Monitor is neded to see log data. Save in .csv file. If .csv file, you can see it by Text Editor, Excell, Lotus 1-2-3 etc. > It's over 130 MB. Trace/Log data is usualy huge. Get log for short period first, check .csv file content by text editor, add filter rule to reduce irrelevant logs(limit file name, etc.), stop excess activity(new mail check by Tb etc.), get log for a while again, ...., repeat it. (In reply to David E. Ross from comment #13) > Test Procedures and Results for 3 Tests To know what is executed at NNTP protocol level(download of minimum headers, download of full headers, download of body, etc.), and to know how long does it take to execue an action(dwenload of body etc.), Tb's protocol log with timestamp is needed. Get minimum NNTP log for download of small number of news articles(for example, 10 articles only) first, view log file, check NNTP level flow. See bug 402793 comment #28 for getting log. SET NSPR_LOG_MODULES=timestamp,nntp:5 Stop automatic new message check of any account, or create new profile, create relevant news account only, subscribe relevant newsgroup only. Get minimum NNTP log for two cases: (same anti-virus/firewall condition) (a) With message filter disabled. (b) With message filter enabled. What kind of difference is seen in log? Body is fetched in (b) only? By the way, as performance related problem, performance data is needed. At least check of difference among following cases is needed. > Number of downloaded message > (A) With message filter disabled : (A-1) small(100), (A-2) medium(1,000), (A-3) large(10,000) > (B) With message filter disabled : (B-1) small(100), (B-2) medium(1,000), (B-3) large(10,000) > (number in above is merely example. required number depends on environment or test case) > (number of "large" case should be carefully chosen. minimum "large value" should be chosen in testing.) If time to complete process is O(number of message), it's usually normal, Required time per a mail in (A) == Ta, Number of messages=N, Total time for (A) == Ta * N Required time per a mail in (B) == Tb == Ta + X, Number of messages=N, Total time for (B) == Tb * N (== { Ta + X } + N ) and if X is far larger than our expectation, something wrong may exist in the additional X. If time to complete process is O( {number of message} ** 2 ) or something, it's apparently something wrong exists in multiple news message processsing.
I only generated the Procmon log in response to the request from Wayne Mery (comment #1 and comment #3). If I could view it, I would not know how to use it. The log would now be a year (and many versions of Thunderbird) old and not likely to be of use today. It has thus been deleted. In the meantime, I have confirmed (comment #13 and the attachment) that this is indeed a problem with using filters and is not affected by the presence of a firewall or anti-virus scanning. I retired from software testing more than 8 years ago and have become somewhat rusty. When I was a software tester, it was at the system level and not the component level. Thus, I do not feel adequate to perform the additional testing described in comment #14. I would hope someone currently active in development or test for Thunderbird would do this.
(In reply to David E. Ross from comment #15) > Thus, I do not feel adequate to perform the additional testing described in comment #14. I see. Closing as INCOMPLETE, because no evidence of "performance problem due to Tb's bad code actually exists" will never be provided by opener of this *bug* report at B.M.O. Please note that here is B.M.O to report actual BUG of software called Mozilla to developer and here is never support forum nor Customer Support/Help Center.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → INCOMPLETE
I am not the only user who has observed this problem; see comment #7. I am not an active Mozilla developer or tester. I believe I have provided sufficient evidence of a real Thunderbird problem. It is now time for active professionals (volunteer or paid) to resolve it. My attachment provides a test case that indeed shows a major timing difference between downloading 12,000+ headers with and without filters. This is a Thunderbird problem. It is not a firewall or anti-virus problem. Since my test was run all within the same hour, it is not likely a broadband problem. I strongly doubt this is a news.mozilla.org problem. I will not put here what I was really thinking when I saw comment #16. However, that comment might as well ask me to provide the corrective code as it does explicitly ask for test results beyond my ability.
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
(In reply to David E. Ross from comment #13) > Created attachment 585056 [details] > Test Procedures and Results for 3 Tests > > Today, I ran three tests. For Test #1, I disabled all filters by changing > the name of the msgFilterRules.dat file; can you attach msgFilterRules.dat to the bug report. Or email it?
Status: REOPENED → UNCONFIRMED
Ever confirmed: false
Flags: needinfo?(david)
(In reply to David E. Ross from comment #13) > Created attachment 585056 [details] > Test Procedures and Results for 3 Tests > > Today, I ran three tests. For Test #1, I disabled all filters by changing > the name of the msgFilterRules.dat file; for this test, headers downloaded > in seconds. For Test #2, all my filters were enabled. Test #3 repeated > Test #2 but with anti-virus software and firewalls disabled. For these two, > not all headers downloaded even after many minutes. > Detailed descriptions of these tests are in the attached PDF file. The file > also describes my configuration. you *really* should be testing with no addons from the pdf ... Setup for all tests 1. Thunderbird is not running. 2. All tests begin without a subscription to for the mozilla.dev.apps.thunderbird newsgroup. 3. Verify that the Thunderbird profile has no files for the mozilla.dev.apps.thunderbird newsgroup. Note: At start of downloading, the indication is there are 12,681 headers. Test #1: No filters 1. Rename file msgFilterRules.dat to be x- msgFilterRules.dat-x to suppress use of filters. 2. Launch Thunderbird with the account for news.mozilla.org collapsed. 3. Subscribe to mozilla.dev.apps.thunderbird. 4. Expand the account for news.mozilla.org. 5. When the Download Headers popup dialogue appears, select the “Download all headers” radio button and then the Download button. 6. Note the time taken to download. 7. Terminate Thunderbird. 8. Delete file msgFilterRules.dat and rename file x-msgFilterRules.dat-x to be msgFilterRules.dat to enable the use of filters. Result: Downloading required less than 30 seconds. Test #2: Filters 1. Launch Thunderbird with the account for news.mozilla.org collapsed. 2. Subscribe to mozilla.dev.apps.thunderbird. 3. Expand the account for news.mozilla.org. 4. When the Download Headers popup dialogue appears, select the “Download all headers” radio button and then the Download button. 5. Note the time at start of download. 6. Terminate Thunderbird after 20 minutes. Result: Less than 1,600 headers downloaded (slower than 80 headers per minute). Test #3: Filters with No Anti-Virus or Firewalls 1. Disable Windows XP and Netgear router firewalls. 2. Disable AVG Anti-Virus for the maximum allowed duration of 15 minutes. 3. Launch Thunderbird with the account for news.mozilla.org collapsed. 4. Subscribe to mozilla.dev.apps.thunderbird. 5. Expand the account for news.mozilla.org. 6. When the Download Headers popup dialogue appears, select the “Download all headers” radio button and then the Download button. 7. Note the time at start of download. 8. Terminate Thunderbird after 14 minutes. 9. Enable Windows XP and Netgear router firewalls. 10. Enable AVG Anti-Virus for the maximum allowed duration of 15 minutes. Result: Less than 800 headers downloaded (slower than 60 headers per minute). Configuration Windows XP SP3
WFM Summary - I can't reproduce this in my version 24 attempt with a reduced testcase, i.e. fewer filters. I tested current trunk, not D.E.R.'s version 9. Several areas including filters have changed between version 9 and my version 24. Adding any more filters increases download time, but increasing number of filters does not substantially affect download time. Situation: need D.E.R. restest with version 24, no addons, no AV software - in fact testing with XP in windows safe mode is recommended. If problem still reproduces, determine a) reduced testcase - for example what is the least number of filters needed to substantially reproduce the problem? b) what is the total CPU consumption during the download period? (so we can say whether there is a code efficiency problem) My setup: 24.0a1 (2013-06-08) on vista laptop cablemodem via home wireless mozilla.support.thunderbird 13406 articles ~4.7MB .msf newsgroup file 1. no filter 18 seconds download 2. 4 filter sets, two each of match=any [1] and match=all [2] 25 seconds download http://people.mozilla.com/~bgirard/cleopatra/?1370729030217#report=a05981c03388a0115a3b8febced7cfa959653506 3. 2 filter sets, two match=all [2] 25 seconds download 4. 2 filter sets, two each of match=any [1] and two match=all [2], plus 5 filters with one subject contains zzzz, and one filter with 50 match any subject contains zzzz 25 seconds download running test #2 filters on already downloaded newsgroup is 3 seconds cpu, ~3 seconds wall time. So, there is a disconnect of roughly 4 seconds. But still, overall performance is good. [1] match=any are both subject contains abcz [2] match=all are both subject contains abcz and zz
Whiteboard: [has test data] → [closeme 2013-06-25]
(In reply to Stanimir Stamenkov from comment #7) > The issue is steadily reproducible with a single filter with a single rule > which references a header which value could only be fetched with a whole > message. The problem is, even when the filter is disabled it continues to > cause download of the whole messages. Do you mean a custom header? Details please. Example filter definition, etc Thanks
Flags: needinfo?(stanio)
Per request in comment #18.
Flags: needinfo?(david)
I can reproduce with attachment 760162 [details]. Key point is this odd behavior ... A. I see ALL the headers download quickly. B. Then they start downloading again, and status bar increments at rate of 1-2 headers at a time. At the end of A mozilla.dev.apps.thunderbird.msf is still 0 bytes.
D.E.R. you don't state in your setup that you are checking headers for user-agent. In the filter UI, does this filter term show in your Google filter condition? Defining a minimal testcase here is key... Reproduces with smaller msgfilters.dat and tiny newsgroup mozilla.tools.socorro version="9" logging="yes" name="auto responder" enabled="yes" type="20" action="Mark read" condition="AND (subject,contains,Schallock, Felix)" name="Google" enabled="no" type="20" action="Mark read" condition="AND (\"user-agent\",contains,G2/1.0)" ** Note, the google filter is DISABLED in the testcase. ** Deleting the google filter avoids the problem. The user-agent condition does NOT display in my filter UI condition="AND (\"user-agent\",contains,G2/1.0)" All the google filters in attachment 760162 [details] are disabled and have a header condition term So issue is at least partly that the header condition affects the filtering list and download process even though the filter is disabled.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: needinfo?(stanio)
Keywords: reproducible
Summary: Slow performance with many Newsgroup Filter conditions When Downloading Many News Article Headers → Newsgroup Filter with user-agent header conditions causes second download of message headers
Whiteboard: [closeme 2013-06-25]
(In reply to Stanimir Stamenkov from comment #11) > > by whole message, do you mean body of the news article? > > I admit I'm just speculating. I don't know much of the NNTP mechanics, but > I'm aware the protocol has a request type fetching just few basic headers, > and for the rest it has to fetch the full headers, if not the full message > including body. precisely correct. it's getting the full headers. Which it should do, if the filter with the condition is enabled.
Summary: Newsgroup Filter with user-agent header conditions causes second download of message headers → Disabled newsgroup filter with user-agent header condition causes download of full article headers
Windows XP SP3 Mozilla/5.0 (Windows NT 5.1; rv:17.0) Gecko/20130509 Thunderbird/17.0.6 Regarding comment #19: I ran all three test cases as stated in comment #19, each with a separate launch of Thunderbird in Safe Mode and with all accounts collapsed. I verified that all extensions and plugins were disabled. During the tests, I avoided other Internet activities. I have a 15 Mbps-rated broadband connection via a cable modem (RoadRunner/Time Warner) wired to a Netgear N300 router that, in turn, is wired to my PC. I am NOT using a wireless Internet connection. Before running Test Case #1, I did not have a subscription for mozilla.dev.apps.thunderbird; nevertheless, I checked and verified no files existed for that newsgroup, including no entry for that newsgroup in file news.mozilla.org.rc. At the start of Test Case #1, there were 13,696 headers to download. Downloading took 35 seconds (391 headers per second). Before running Test Case #2, I deleted the subscription for mozilla.dev.apps.thunderbird from Test Case #1 and verified no files remained for that newsgroup, including no entry for that newsgroup in file news.mozilla.org.rc. At the start, there were still 3,696 headers to download. I terminated Thunderbird after 20 minutes, at which time 1051 headers had downloaded (0.9 headers per second). Before running Test Case #3, I disabled AVG 2013 until the next reboot, the Windows XP firewall, and the firewall in my Netgear N300 router. I deleted the subscription for mozilla.dev.apps.thunderbird from Test Case #2 and verified no files remained for that newsgroup, including no entry for that newsgroup in file news.mozilla.org.rc. At the start, there were still 3,696 headers to download. I terminated Thunderbird after 14 minutes, at which time 741 headers had downloaded (0.9 headers per second). Re comment #24 and comment #25: Yes, I have custom headers in the five Google filters, including UA headers. In my tests, however, all five Google filters were disabled. Disabled filters should be ignored and not cause the processing that I have observed. --------------------------- Perhaps the filters should be parsed before any downloading. Then, if a filter requires a full header (e.g., because of a custom header in the filter), full headers should be downloaded ONCE instead of downloading partial headers. However, a disabled filter does NOT require any header -- full or partial. Furthermore, if partial headers can be downloaded at the rate of almost 400 per second, I do not understand why full headers download at the rate of less than one per second. Surely a full header is not 400 times larger than a partial header. Is Thunderbird checking each header as it downloads? Might it be faster to download all the full headers and then check them?
(In reply to Wayne Mery (:wsmwk) from comment #24) > D.E.R. you don't state in your setup that you are checking headers for > user-agent. > In the filter UI, does this filter term show in your Google filter condition? I of course know you "have custom headers in the five Google filters". My question is different. In the filter *UI*, does this filter term "user-agent" *show* in your Google filter condition?
Flags: needinfo?(david)
Wayne, have you added the "user-agent" as a custom field in the filter editor?
Re comment #27: Sometimes the UA string (header line User-Agent) is merely G2/1.0. I generally disable this filter because there are often legitimate messages with that. Sometimes header line User-Agent with G2/1.0 is coupled with the header line X-HTTP-UserAgent with Trident. I cannot remember if that is the entire value of X-HTTP-UserAgent or a fragment. Some spam seems to be associated with particular IP addresses. Filter Google3 has the header line NNTP-Posting-Host for a number of IP addresses, including some fragments of IP addresses. Then there is the filter Google4, which has User-Agent with G2/1.0 and the header line Injection-Info with an IP address. And filter Google5, again with User-Agent with G2/1.0 and the header line Injection-Info with a fragment of an IP address. All of these filters were constructed by examining the source of spam, flame war, troll, pornography, and other unwanted postings to various news.mozilla.org newsgroups.
Flags: needinfo?(david)
memory coming back - if my recollection is correct ... Once the custom filter is defined it does not matter that the filter is disabled. The specified custom header information is forever associated with the account (filters are per account) so that henceforth the required header data is in the .msf file for potential filtering in the future. Regardless of account type: imap, nntp, etc
Summary: Disabled newsgroup filter with user-agent header condition causes download of full article headers → Disabled newsgroup filter with custom header condition (user-agent for example) causes download of full article headers
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: