This was discussed in the newsgroup.
I will shortly attach a testcase HTML page from Steven Leving that demonstrates
this behaviour: it takes about 30s to 1min to load on OS/2 even on up to date
hardware while using nearly 100% CPU. On Windows and Linux this only takes a few
seconds on the same machine.
It was discovered that the scrollbar in repainted very often (in
nsWindow::OnPaint()), up to 50 times per second during load, but window message
processing and painting does not seem to be where the bulk of the time goes.
Instead something outside of fnwpNSWindow() seems to be spending most of the CPU
Created attachment 157975 [details]
Zipped copy of a 1.4 MiB HTML page
*** Bug 319047 has been marked as a duplicate of this bug. ***
I tried to set nglayout.initialpaint.delay and it doesn't do much --- it delays initial displaying of a page for many seconds, however after that, Firefox still keeps displaying new stuff instantly line-by-line in both my case (bug 31904) and file attached to this bug.
I believe that this 50-times per second banging on a pmshell code is a problem that causes this bug. Note that OS/2 thread scheduler is quite **** --- basicly, each thread has a priority and it does round robin across ready threads with highest priority. Because this would cause starvation and poor interactive response, the schedules has a lot of ad-hoc hacks that increase priority at some specific syscalls (keyboard read, pmshell code, disk I/O) or if the thread hasn't run in about 1 second (to prevent starvation). Now if you call pmshell syscall 50 times per second, that thread will have very high priority and other threads won't run (except when their starvation-timer expires, they'll get a very small timeslice).
In contrast, Linux scheduler measures accumulated execution time of each thread, assigns priority based on this time and doesn't care what the thread is doing. So it doesn't have the problem of hogging CPU when certain syscalls are used.
I figured out this --- in case of discussion forum (bug 31904), the problem is mitigated if you put firefox window on background. When you do this, Firefox still displays one-line at a time for some time, but sometimes it makes a huge step and after few such steps the page eventually loads. (in contrast, when firefox is in foreground, the progress is always one-line at a time and the page never loads). This support the hypothesis that it's related to schedules: OS/2 scheduler has different priority management for foreground and background processes.
However, this background trick doesn't speed-up displaying of TestPage.html from this bug.
I think that instantly calling GUI functions is an error that is causing this. I don't know if profiling would make sense --- the profiler won't tell you if the thread consumed CPU time inside pmshell syscall or if it blocked there.
Anyway, you can try lowering priority of GUI thread or increasing priority of other threads to see if it improves things. I can't do it because it can't be done with external application, the syscall for setting thread priority needs to be run directly from the process.
I saved the attachment and extracted it locally. It took yesterday's SM 1.1b nightly 35 seconds to load from SATA on 2.8G P4, pegging CPU the whole time.
I tried this with the available test download, indeed takes a long time with FF3.0. Then I tried my own "database" which is the index from the OS2-gg/eCS-gg magazine Draad/2. Its in Dutch, but for measurements it doesn't matter because years, numbers and so on are universal. On line you can find it here:
There is also an offline version if you want to rule out any delay from servers, its here on the bottom of the page.
http://home.hccnet.nl/joop.nijenhuis/os2ecs/ecs5.htm (around 450k for download)
You can unpack it in a directory, all of it and start the index with the file "draad.htm". There is a readme in Dutch how to set up Mozilla with it, but there are in fact a number of ways. The given solution is one of many if you want to give it an own icon on the desktop or menu.
When you start "draad.htm" you will launch a page with frames. The center frame, the index self is at the moment 884k, not as big as the testfile, but nothing is the same as found in the testfile. When I'm loading this file the first part just goes quit fast, but down to the end of the file it gets slower. Therefore I think we have to find the bug in the cache of pages somewhere. It looks like some bucket gets full and on one side we let some water out and on the other side the same amount is coming in. That's a slow process. Of course I could be wrong.
What I did was scrolling down during loading to find out on what year and number it was busy. When you constant push the bar down you can see that it begins to slow down. This does have to reasons. One the available text has grown and second that speed limit. The downside of the available testfile is that every entry is the same, you can't see at what point you are in the file.
Hope this does help to find this bug.
Joop, thanks for the additional test pages, but I don't really see how that can help me. Cache could be an issue, but that is really implemented in a cross-platform way, so then I don't see how that can be an OS/2-only problem.
OK, as we are still lacking a profiler (that I know of) I'm going to try the suggestion that Heiko posted in the newsgroup and try to use GCC coverage and gcov to find out more.
(In reply to comment #6)
> Joop, thanks for the additional test pages, but I don't really see how that can
> help me. Cache could be an issue, but that is really implemented in a
> cross-platform way, so then I don't see how that can be an OS/2-only problem.
As far as I know, but I could be wrong, no platform handles memory the same way. Cache from FF is build in memory. So what might be good for Windows or Linux might be bad for OS/2. I still believe that there is somewhere some code around the caching process which acts bad in case of OS/2. May be its closer to home and we have to change some settings in the about:config. May be we have to find the answer in the video handling, after all its also a visual problem.
Created attachment 330462 [details]
most often called OS/2 code lines
So I made myself a short version of Steven's test page (one that instead of 2570 instances of Header1 only contains only 7). I have a build of Firefox 3.0 that I compiled with GCC coverage analysis on and then used gcov to analyze which lines were more often called for the full test page than for the short test page.
Surprisingly, the lines in nsWindow, where I mostly suspected the slowdown, are just called 7000 or 14000 times for the long page. The most called lines (as can be seen in this attachment) are in NSPR, to me the stuff about the high resolution timer stood out the most. So I set NSPR_OS2_NO_HIRES_TIMER=1 and ran again, to find that the time needed to load the page decreased from ~3min to ~30s!
So it seems that the time resolution of the timer directly influences how often stuff is redrawn. Not sure, how to fix this really, that environment variable at least provides a quick workaround to improve matters.
(In reply to comment #9)
> just called 7000 or 14000 times for the long page. The most called lines (as
> can be seen in this attachment) are in NSPR, to me the stuff about the high
> resolution timer stood out the most. So I set NSPR_OS2_NO_HIRES_TIMER=1 and ran
> again, to find that the time needed to load the page decreased from ~3min to
> So it seems that the time resolution of the timer directly influences how often
> stuff is redrawn. Not sure, how to fix this really, that environment variable
> at least provides a quick workaround to improve matters.
If I remember it correctly it had something to do with the bar on the right side of the screen which shows up if the page is longer than screen height. Could be wrong. Any way I tried your remark which is also called in the readme. I took my own database, see above. I do use RUN with a file FireFox!.ENV. The first run was with original file. Loading the database to the point that the grey bar in the right corner disappeared took 1 minute 15 seconds. I modified the FireFox!.ENV with the addition of the line SET NSPR_OS2_NO_HIRES_TIMER=1. Started FireFox again and first dumped the cache. Then I loaded the database again, this time it took 18 seconds, almost 1 minute less! Did it twice to rule out errors.
Created attachment 330849 [details] [diff] [review]
Sometimes solutions are sooooo simple. ;-)
Found this typo (I think & instead of && is what should always have been there) in nsWindow::GetLastInputEventTime so we always reported input even when there was none and then always repainted pages in high time resolution mode.
Also found this
/*** WinQueryQueueStatus() constants */
#define QS_KEY 0x0001
#define QS_MOUSEBUTTON 0x0002
#define QS_MOUSEMOVE 0x0004
#define QS_MOUSE 0x0006 /* QS_MOUSEMOVE|QS_MOUSEBUTTON */
in pmwin.h, so QS_KEY | QS_MOUSE should be good enough to cover all types of input.
For the curious testers, http://temp.weilbacher.org/sm200807181122_wdgtos2_bug258136.zip contains a DLL with the fix. NSPR_OS2_NO_HIRES_TIMER=1 and the content.* prefs shouldn't be necessary with that any more, but let me know if you get different results.
Comment on attachment 330849 [details] [diff] [review]
Good find! Would be nice to have that applied to 1.8.1.x branch, too.
Pushed to Hg (http://hg.mozilla.org/mozilla-central/index.cgi/rev/29309764fdb1) and checked into CVS, too. So this will be in the next nightlies but too late to be in Firefox 3.1a1.
I thought we could use this fix as an incentive to get users of old versions to upgrade. :-) Seriously, I will wait a few days and get it into the 1.8 branch in time for the next release. If I haven't done it in the next two weeks or so, please remind me.
Always feels great to resolve such long-standing bugs. :-)
Got some good feedback in the newsgroup and nothing bad (at least nothing connected to this), so I fixed this on the 1.8 branch, too.
Mozilla/5.0 (OS/2; U; Warp 4.5; en-US; rv:126.96.36.199) Gecko/20080913 SeaMonkey/1.1.12 (PmW) is much improved over PmW 1.1.11 on http://www.ofb.biz/safari/article/459.html which is only 36k. CPU peg time I estimate somewhere in excess of 1000% improved. Same page in FF 3.0.2 displays virtually instantly, while switching to the page's tab in 1.1.12 causes paint delay and CPU peg of about 8 seconds on 3.2G P4 ICH8 SATA300 using Snap 3.1.8 & X600 Radeon PCIe.
OTOH, I'm CC'd on 676 bugs here currently. It takes 45 seconds to load a query page containing them all in 1.1.12, while only 19 seconds in FF 3.0.2. http://tinyurl.com/msj4y