2.75 KB, patch
|Details | Diff | Splinter Review|
2.25 KB, patch
|Details | Diff | Splinter Review|
5.38 KB, patch
|Details | Diff | Splinter Review|
5.67 KB, patch
|Details | Diff | Splinter Review|
(build 2000112208) i'm on a 640k/7mbit adsl line, my isp has about 35k groups on it's news server and the group file is a little over 1.6megabytes. with "tin", it takes less than 5 seconds to fetch the groups, network meter shows speeds around 800kbytes/s when getting the groups. with mozilla, the groups come about 20kbytes/s and it takes several (2-3) minutes to refresh the group list. during the refresh, cpu load is near 0%. why is this so slow, when cpu speed and network speed are clearly not the issue?
What would be interesting would be to see a trace of the commands sent to the news server. Is it possible for you to use tcpdump or something similar to log what's happening? Maybe tin uses some other (more efficient) method than mozilla.
Wow. That was quick. Is it possible to get the output as ascii chars too? I think I have seen it nicely formatted somewhere, but maybe that was another program than tcpdump. I find it a little hard to read the hex code. Anyway, one big difference is that tin's tcp blocks all are 1448 bytes but mozilla's block differs in size all the time but are around 900-1000 bytes most of the time. It's clear that tin's transmission looks more efficient, but what causes the size of the packets to vary is beyond my knowledge. I will cc email@example.com who might know why that is. I assume that news use necko for the network communication? Taking that as an target for this bug, I will confirm it.
Status: UNCONFIRMED → NEW
Ever confirmed: true
http://xxx.ath.cx/tcpdump-tin-ascii http://xxx.ath.cx/tcpdump-mozilla-ascii i hope they're of use, i don't know what to make of them :)))
Changing QA contact to Daniel Bratell, thanks!
QA Contact: esther → bratell
For purpose of being able to see this bug in a special query, I am re-assigning QA to myself, but Janne, please feel free to keep on this...
QA Contact: bratell → stephend
If I could read the traces correctly, tin do: MODE READER\r\n XOVER\r\n (server answered "Not in a newsgroup") LIST OVERVIEW\r\n (server answers with some data) LIST\r\n While for mozilla only a LIST\r\n is shown. I don't now if it's truncated but there does seem to be some differences. What's XOVER? What's MODE READER? I don't know if this affects the performance, but it would be interesting to know what would happen if mozilla sent the same commands before the LIST if it doesn't already. Janne, could you try to see if you can start the trace for mozilla earlier before the connection is done (which I don't know when it's done) so that we can see what mozilla sends. Alternatively, maybe sspitzer can tell us.
did new asciidumps, see urls above.. but one thing should be noted: when mozilla is fetching the groupslist, the tcp receive queue from nntpserver is constantly full. so there is data to be read, but for some reason we do it slowly. therefore it might not be so much of a protocol issue, but related to the way mozilla reads and handles data from the socket.
I see in your last trace that we don't do anything else but LIST. I wonder what MODE READER and XOVER does. I will try to remember to look it up. That the network buffer is full could very much cause the things we see, with varying size on the packets. So, who's to blame? Are we not checking the buffers often enough? Are they too small? Can we increase the size of them and test? *changed summary* Btw, how/where did you see that the buffers are getting full?
Summary: getting newsgroups is agonizingly slow. → Getting newsgroups is much slower than other programs (too small network buffer?).
the Recv-Q in netstat shows bytes in the kernels tcp buffer that have not been read by the program that owns the socket. mine shows constantly that there are over 55kbytes in the buffer. so we're just not reading the data from the socket as fast as we could.
removing "too small network buffer?" from subject since it's kind of irrelevant and not in mozillas control. the problem is not that the buffer is too small, but that we don't read it fast enough. a thing to add to my previous comment: with tin, the "Recv-Q" in netstat shows that there's either 0 or max 2 kilobytes waiting in the buffer when you check it continuously during the fetch, which means kernel can keep accepting data to the buffer all the time from the other host.
Summary: Getting newsgroups is much slower than other programs (too small network buffer?). → Getting newsgroups is much slower than other programs.
Adding self to CC to keep track of bug.
I've done some investigation and it seems like we can't add groups quicker than 40000 groups / second and that's if there's no code overhead at all and the timer is very exact. nsNNTPProtocol::ReadNewsList is controlled by a timer and it won't take anymore than 20 groups at a time. After 20 groups it starts a new timer at 50 microseconds (20 groups * 2000 fired timers = 40000 groups). Since there is delays I think there might be only 10-100 timers fired every second maximum which might be one big reason for the long update time compared to for instance tin. Also, the NET_IS_SPACE macro in that function is using an extraordinary amount of time (49% of the time according to Quantify). I wonder if the isspace function (which the macro uses) might be locale sensitive and therefore much slower than checking for ' ' or '\t' directly. I will try that change too.
I changed READ_NEWS_LIST_COUNT_MAX from 20 to 100 and replaced the isspace with a check for ' ' or '\t'. That reduced the time to read the group list from 70 to 25 seconds. I didn't see any unresponsiveness after that change. My computer is a Celeron running at 375 MHz and I have 512 kbps to the newsserver. I will continue to experiment. jhayryne, do you do your own builds and can check if you get the same result?
I tried some other values of READ_NEWS_LIST_COUNT_MAX. Remember that 20-25 seconds is the theoretical maximum with my net speed: READ_NEWS_LIST_COUNT_MAX Time to refresh ============================================ 20 69,7 s 50 28.8 s 100 25,5 s 200 23,0 s 500 26.8 s 1000 23,4 s In none of these I noticed any problems with responsiveness, but then the problems should be larger with faster network. I will attach a proposed patch and take the bug (please tell me if I shouldn't do it, but since noone seems to have taken their time I hope noone will object). The patch changes NET_IS_SPACE to check for ' ' and '\t' instead of using a check for ASCII (< 0x7f) and then a call to isspace. RFC 977 (NNTP) says that SPACE or TAB should be used for whitespace so there is no need for fancy whitespace checks. The patch also changes READ_NEWS_LIST_COUNT_MAX to 200 which seems to be a reasonable number. If anyone has the ability to check this patch with a really fast connection to a fast newsserver please comment in this bug on any unresponsiveness in Mozilla during newsgroup fetch. I think this throttle was more important before sspitzer made the other operations so quick. The patch also changes the code to compare against '.' only once which might make the code several nanoseconds quicker, and also corrects a comment. sspitzer, can you review the patch? This will, if the gods are willing, make the subscribe dialog another ten times faster if the net is fast enough.
did some tests with different READ_NEWS_LIST_COUNT_MAX values and heres what it got: 200 19s. 500 11s. 1000 8s. didn't see any problems. perhaps bigger value than 200 would scale a bit better on faster links though. looks good.
How fast is your computer? Slower computers are worse off when the thunk size increases. Did you try to surf the web or move windows while fetching newsgroups? If you did that without trouble and have a slow computer, it looks like we should go to 1000 or even more. 8 seconds sounds close enough to tin to close this bug, or what do you think? Btw, did you happen to see if the network buffer problem is gone?
well you really can't surf the web while fetching newsgroups since the subscribe window is modal, as was mentioned in the bloat bug. the subscribe window's modal issue is in bug 57266 my machine is 600mhz/256mb, i tried running my test build remote on a p90/48mb but for some reason it wouldn't start.. i'll try it again later since it'd make a good comparison.
It's only modal to the mail/news-window. You can still work with any browser windows you have open. I just checked this again.
sorry for the delay in commenting, I've been on vacation. I'll go review and try the patch now.
ok. Daniel is right to up the MAX to 200 from 20. I'm not convinced it is safe to change the NET_IS_SPACE macro. isspace() return true for more than space and tab. among other things, it returns true for \n and \r. since NET_IS_SPACE is called in other parts of news code, i'm going to need to verify that switching to Daniel's faster implementation won't break anything. the good news is by simply changing the MAX, dowloading is much faster. I've got changes to properly reflect the download rate in the subscribe dialog (it currently says "?.? Kb / sec"). I'll go measure the download rates with MAX at 20 and MAX at 200 when using subscribe to news.mcom.com and news.supernews.com over my ISDN line.
over my ISDN talking to news.supernews.com, here's what I get: max == 20, rate is 15K / sec. max == 200, rate is 38K / sec. with max at 500, I didn't see any gain, but my ISDN is pretty slow. since I also didn't see any responsiveness problems, I'm ok with setting it at 500 to benefit users who have faster connections, assuming this doesn't hurt the responsiveness on the target machine over a modem connection. (the target machine has 64 MB ram and about a 200-300 Mhz CPU, I think)
I'm not 100% sure either. I read the spec and it's enough to check for SPACE and TAB while listing news groups and all the places it's used (three places alltogether) only handles one line at a time so it feels safe. I do know that the isspace method turned out to be really slow on windows though. As I said somewhere, 49% percent of the computation time during group retrieval was spent in that function so I think we can get somewhere around 30% speedup by switching to a raw check for SPACE and TAB. It's difficult to make really good measurements since the measurements depend on the network so much. It is the constant that causes the big change though. At least for people with fast nets. I had somewhere around 15 KB/s before and 50 KB/s afterwards and I have no idea where the top is. For the modem users, I thought that when the input stream was empty and NNTP_PAUSE_FOR_READ is set, that the UI-thread would regain control until more data arrived? If that's the case the problem will only be for people which never get out of data in the input queue and takes too long to process the group batch. For the value of the constant, 500 feels safe. Maybe we could go even higher to benefit fast nets even more but it feels risky until someone with a low end computer tests it.
you are right about PAUSE_FOR_READ, we should be safe on slower connections since we will not be starving the UI thread. I'll attach the patch I plan to land after I get it super reviewed. note, it includes the UI changes to display the download rate.
adding patch keyword.
cc'ing putterman. a while ago putterman and I were discussing why downloading the list of newsgroups from the serve was so slow over a fast connection. Daniel and Janne figure out the problem and provided a patch. thanks again to daniel and janne for the excellent work.
The patch looks fine (even without my nano second optimization of dot reading. :-) ). Only one small thing, isn't KB the common abbreviation for kilobytes? (I've not fallen for KiB yet): -bytesReceived=%S K read (at %SK/sec) +bytesReceived=%SK read (at %SK/sec) Apart from that, I'm satisfied with the patch. Therefore giving bug back to sspitzer. :-)
Assignee: bratell → sspitzer
Status: ASSIGNED → NEW
I used K because that is what we did in 4.x let's take the K -> KB discussion to bug #61776. I'll land this patch after I get it super reviewed. cc'ing bienvenu for super review.
Status: NEW → ASSIGNED
sr=bienvenu (but you might want to double check that this builds on the mac since it has a funny way with 64 bit #'s)
adding syd the cc list. he also did a review and gave me some suggestions. I'll attach the final patch before I mark this fixed.
Status: ASSIGNED → RESOLVED
Last Resolved: 18 years ago
Resolution: --- → FIXED
note, this build fine on the mac. sadly, I have a mac at home.
This bug is VERIFIED FIXED on Mac 12-05-00, NT 2000120504, and Linux 2000120508.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.