Closed Bug 338762 Opened 18 years ago Closed 8 years ago

Slow message display and many paint events with SeaMonkey

Categories

(Core :: DOM: Navigation, defect)

All
OS/2
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: raydav, Unassigned)

Details

(Keywords: regression)

Attachments

(2 files)

User-Agent:       Mozilla/5.0 (OS/2; U; Warp 4.5; en-US; rv:1.7.12) Gecko/20050922
Build Identifier: All SeaMonkey that were tried

I have two email symptoms that occur together and show SeaMonkey performing slower than Mozilla suite.



Reproducible: Always

Steps to Reproduce:
To test, Run Mozilla suite and then repeat with SeaMonkey.

Open email window; split with list in top pane and message in lower pane.
Monitor message pane with PM Spy.
Activate a message in upper pane.
Using cursor keys scroll to next message.
Note how long it takes for the next message to appear and how many paint operations PM Spy reports.
Actual Results:  
With Mozilla suite the time appears the same and is less than one second on a variety of machines from an AMD 550 socket seven to an AMD 2200.  PM Spy reports four paint operations.

With SeaMonkey the time is a function of the hardware, ranging from two seconds on the AMD 2200 to twenty seconds on the 550.  PM Spy reports about thirty five paint operations.

Expected Results:  
I would expect SeaMonkey to perform at least as well as Mozilla suite.


I initially did this at home and it was confirmed at a SCOUG meeting on another machine.
I tried this several times now and very often my system hung when using PM Spy on SeaMonkey, and I am tired of rebooting now, so I give up. But I have to say that when I monitor the "message pane" (I clicked on the frame where the body gets displayed) I didn't see _any_ WM_PAINT operations reported by PM Spy when going from message to message. I guess you clicked elsewhere, but are you sure that you clicked on exactly the same spot in Mozilla and SeaMonkey?

Normal PM diagnostic tools are badly suited to operate on Mozilla/XUL apps -- remember that the MailNews window of SeaMonkey and Mozilla that appears as a single PM window is really composed of lots (hmm, tens?) of windows in the PM sense...

I could try to create a debug build with paint flashing, that would show how many paints Gecko is doing.
Hi,

Since I was the one that did most of the actual analysis, I guess I should comment.  I would have commented sooner except it took Ray 6 weeks to submit the bug and tell me the bug#. :-)

What I did was disable everything but WM_PAINT tracing.  It is best minimize the number of traced messages.  I've never had pmspy hang the system, but I have suffered from information overload at times.

The message pane window was selected then the test was done.  The selection was done with the mouse rather than the dialog listbox.  The paint count with Mozilla was 4 as best I can recall.  I suspect the extra paints were to redraw the URLs.  The paint count for Seamonkey was closer to 40.  I was not able to correlate this to anything else that was getting repainted (i.e. the message index or whatever).




Humm.  I just tried to recreate this with the Seamonkey 1.0.2 and I no longer get the large quantity of paint messages.  I forget what I was running 6 weeks ago.  It might have been 1.0.
(In reply to comment #3)
> Humm.  I just tried to recreate this with the Seamonkey 1.0.2 and I no longer
> get the large quantity of paint messages.  I forget what I was running 6 weeks
> ago.  It might have been 1.0.
> 

I have:

Filter\Exclude all messages
Filter\By message\WM_PAINT\Include\Done
Spyon\Select desktop window ....... get red cross, left click on message 
header box in lower pane.

Select message folder with simple text messages.
Left click or use up\down arrow to select different messages in upper pane.

In Mozilla 1.7.12, new message selection takes less than a second.  There are maybe 8 paint lines for the first message and then 3 or 4 for each selection in that folder after the first.

In SM 1.0.2-Peter, there is one paint line when the message is selected, then a pause with the activity monitor pegged, and then more paint lines than will fit on the screen.
OK, I have a debug build 
   Mozilla/5.0 (OS/2; U; Warp 4.5; en-US; rv:1.8.0.5)
   Gecko/20060708 SeaMonkey/1.0.3
now with "paint flashing" and "paint dumping" enabled. (The first adds an additional flash when Gecko paints a region on screen, the second outputs a line on stdout.) I cannot do a debug build of 1.7.x at the moment to compare...

I first did some tests with the "Normal" header display with both my normal work profile (that has the mnenhy extension installed) and a clean profile (without mnenhy). When I first select a message, I count about 30 PAINT events most of which paint the header. When I re-select a message that I had selected before this is down to 18 or 19 events (for messages with attachments there are about 5 to 7 more paints events). The only paint event that I would really expect to take some time (for a complex message) would be the one to display the message body and that is really only drawn once. I don't really understand why things like the component bar, the unread and total columns are repainted.
Summary: Slow message display and extra paint lines in PM Spy → Slow message display and many paint events with SeaMonkey
Hmm, I really wonder why the header fields are (unnecessarily?) updated multiple times. E.g. switching between two adjacent messages with the keyboard I see that first the subject field is cleared, then it gets filled with the old subject line, and only the third time it gets painted the new subject line appears. Similarly for the Date, From and User-Agent fields, although these never get cleared but repainted with the old content twice.

I don't think this can be OS/2 specific, so I CC a few guys who have some insight into MailNews and perhaps can answer why this is, if this has really changed from Mozilla 1.7.x to SeaMonkey 1.0.x, and what could be done about it. I don't think that repaints of these small header regions should take much time or CPU, even on slow machines, so solving this problem probably does not solve the "SeaMonkey Mail is so much slower than Mozilla 1.7.x Mail on OS/2"-puzzle.
The only thing I can guess at is that the repainting of text inputs may have changed between versions.
Possible, yes...  Are there OS/2 nightlies that would allow us to narrow down the regressions?  Or is someone willing to do some builds by date (binary search and all)?
bz, yes, there are OS/2 nightlies but note that the regression obviously occurred some time _before_ SeaMonkey 1.0 and there are not public archives that I know of where one could download such old nightlies.

If you have any good ideas which checkins back then could have caused these changes and the slowness in message display I could build stuff that I check out by date.
Peter, I assume http://archive.mozilla.org doesn't have things from the date range you want?

I don't really have any decent ideas; binary search by date to give a starting point is what I was hoping for. :(
(In reply to comment #10)
> 
> binary search by date to give a starting
> point is what I was hoping for. :(
> 
I have not seem the symptom in the latest 1.8x that I could find, if that narows it any useful amount.

Ray, that is only useful if you also give a full user-agent string from Help->About.
(In reply to comment #10)
> Peter, I assume http://archive.mozilla.org doesn't have things from the date
> range you want?
> 
> I don't really have any decent ideas; binary search by date to give a starting
> point is what I was hoping for. :(

No, archive does not have any contributed nightlies as far as I can see.

When I find time I could try to make a Linux debug build with paint flashing and observe if I see similar many repaints in mail headers. But I was really hoping one of the guys working on MailNews had one and could take a look.
None of them seem to be cced on the bug, though...
(In reply to comment #12)
> Ray, that is only useful if you also give a full user-agent string from
> Help->About.
> 
This is:
Mozilla/5.0 (OS/2; U; Warp 4.5; en-US; rv:1.8b) Gecko/20050301

Scroll is normal.
(In reply to comment #14)
> None of them seem to be cced on the bug, though...

I am not sure who is really doing work on MailNews these days so I just cc'd the three people I know. If you know anybody else who could shed some light on this please add him.

But in the end I fear this will just be another OS/2 bug which was discovered too long after the fact (because of the very limited testing the small OS/2 community does on pre-releases) and will never get fixed.
Ray tells me that the oldest nightly that we could find, i.e.

Mozilla/5.0 (OS/2; U; Warp 4.5; en-US; rv:1.8b2) Gecko/20050706 SeaMonkey/1.0a

already exhibits the problem for him. So the regression was somewhere between 20050301 and 20050706. Too bad, that's too large a timespan.
Archive now has a few I found, which hopefully I ID'd correctly before upload: http://archive.mozilla.org/pub/mozilla.org/mozilla/nightly/contrib/
I used seamonkey-1.0a.en-US.os2 of 2005070619 and
Mozilla/5.0 (OS/2; U; Warp 4.5; en-US; rv:1.8b2) Gecko/20050426
which is one of the old builds that Felix found and I see the same number of paint events in PM Spy. But as Ray now refers to a problem with "scrolling" which I don't see anyway, I leave it to him to test further.
Severity: normal → minor
(In reply to comment #19)
> I used seamonkey-1.0a.en-US.os2 of 2005070619 and
> Mozilla/5.0 (OS/2; U; Warp 4.5; en-US; rv:1.8b2) Gecko/20050426
> which is one of the old builds that Felix found and I see the same number of
> paint events in PM Spy. But as Ray now refers to a problem with "scrolling"
> which I don't see anyway, I leave it to him to test further.
> 
I use email scrolling as a quick check.  I just unzip and run the exe.  My on-line box is a P4-1.7.
Mozilla/5.0 (OS/2; U; Warp 4.5; en-US; rv:1.8b) Gecko/20050301 is takes less than a second to scroll between two text messages.
Mozilla/5.0 (OS/2; U; Warp 4.5; en-US; rv:1.8b2) Gecko/20050426 pegs the activity monitor for maybe twenty seconds - I haven't been counting lately, but it is too slow to be useful.

With the samples I have it seems 1.8b1 is OK, 1.8b2 is not.
 
OK, that narrows it down a bit more but

http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=MozillaTinderboxAll&branch=HEAD&date=explicit&mindate=2005-03-01+00%3A00&maxdate=2005-04-26+08%3A00&cvsroot=%2Fcvsroot

still shows _lots_ of checkins. None of the changes to OS/2 files seem related to me and of the others I only found that the checkins for bug 288117 had some performance-related problems but those seem to have been cleared in the end.

So, unfortunately, that didn't help very much.
It sounds like someone with OS/2 access will have to do some builds or something...
Yes, in Seamonkey email loading message is painfully slow. On my PC (W2000 400 MHz PII) it takes 5 seconds or more with 6 emails in the folder and even for very short messages.
Ray, as you reminded me again about this problem in the newsgroup, another shot in the dark: please try "set NSPR_OS2_NO_HIRES_TIMER=1" in a CMD shell and start SeaMonkey from that. Are you running the ACPI driver or have replaced the TIMER.SYS driver from the original one?

Vladimir, your problem is certainly a different one, and probably has been resolved by the SeaMonkey team in the meantime. This bug is about a very specific problem on OS/2.
(In reply to comment #24)
> Ray, as you reminded me again about this problem in the newsgroup, another shot
> in the dark: please try "set NSPR_OS2_NO_HIRES_TIMER=1" in a CMD shell and
> start SeaMonkey from that. Are you running the ACPI driver or have replaced the
> TIMER.SYS driver from the original one?

I changed everything.  New machine running eCS 2.0b3a.  I also have the original W4 installed for comparison.  With the CMD file below, no change.

rem RunSM.cmd
NSPR_OS2_NO_HIRES_TIMER=1
H:\SM11b4HiMemWeil\seamonkey.exe

On one machine running eCS 1.2rm there was no delay, but I could not duplicate that on any other machine.  I still have that HDD.
(In reply to comment #25)
> rem RunSM.cmd
> NSPR_OS2_NO_HIRES_TIMER=1
> H:\SM11b4HiMemWeil\seamonkey.exe

Does that not need to be "set NSPR_OS2_NO_HIRES_TIMER=1" to take effect in a batch file?
(In reply to comment #26)
> (In reply to comment #25)
> > rem RunSM.cmd
> > NSPR_OS2_NO_HIRES_TIMER=1
> > H:\SM11b4HiMemWeil\seamonkey.exe
> 
> Does that not need to be "set NSPR_OS2_NO_HIRES_TIMER=1" to take effect in a
> batch file?
> 
OOPS.  That's why I clipped the actual file.  I am not to be trusted.
rem RunSM.cmd
set NSPR_OS2_NO_HIRES_TIMER=1
H:\SM11b4HiMemWeil\seamonkey.exe

Deasn't help.

OK, so i spent some hours last night and produced and uploaded three old builds, see
   http://weilbacher.org/temp/old_18_Packages
(I hacked the binaries afterwards so that they really show the relevant date in Help -> About.) Please try them and see which ones are slow so that I know in which direction to go further.
(In reply to comment #28)

20050301 has the delay.

When I lost a HDD I lost my only copy of 1.8b1, which did not have the delay.  I cannot find a copy.  I want that as a reference.  Is there a copy of 1.8b1 somewhere you can point me to?
(In reply to comment #29)
 
> When I lost a HDD I lost my only copy of 1.8b1, 

I found it:
http://releases.mozilla.org/pub/mozilla.org/mozilla/releases/mozilla1.8b1/contrib/mozilla-os2-1.8b1.zip

It identifies itself as 2005030114.
It does not have the delay.
I hate these build IDs! I checked, using cvs log on client.mk, that the MOZILLA_1_8b1_RELEASE tag was actually created on "2005/02/18 04:41:30". I supposed that this means that the OS/2 1.8b1 release that says it is 2005030114 is actually a 2005021804... So, we narrowed the range to between these dates:

http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=MozillaTinderboxAll&branch=HEAD&date=explicit&mindate=2005-02-18+04%3A00&maxdate=2005-03-01+00%3A00&cvsroot=%2Fcvsroot

Still a lot of stuff. Let me try to build one in between, like 20050223.
> Vladimir, your problem is certainly a different one, and probably has been
> resolved by the SeaMonkey team in the meantime. This bug is about a very
> specific problem on OS/2.
> 

I do not know but it behaves the same way under W2000.
I have the latest Monkey and the problem is still there.
(In reply to comment #32)
> I do not know but it behaves the same way under W2000.
> I have the latest Monkey and the problem is still there.

And you also see that it is fast in the Mozilla 1.8b1 release and slow in the SeaMonkey 1.1a release? Hmm, there don't seem to be Windows builds on archive.mozilla.org, either, only Linux...
(In reply to comment #33)
> (In reply to comment #32)
> > I do not know but it behaves the same way under W2000.
> > I have the latest Monkey and the problem is still there.
> 
> And you also see that it is fast in the Mozilla 1.8b1 release and slow in the
> SeaMonkey 1.1a release? Hmm, there don't seem to be Windows builds on
> archive.mozilla.org, either, only Linux...
> 
This is what I run:
Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.8.1.2pre) Gecko/20070111 SeaMonkey/1.1

BTW, it is a little, little bit better than in the 1.o.x.
(In reply to comment #33)
> And you also see that it is fast in the Mozilla 1.8b1 release and slow in the
> SeaMonkey 1.1a release? Hmm, there don't seem to be Windows builds on

Args, I meant the SeaMonkey 1.0a release not 1.1a...


Well, for Ray to test, I built 2005022300 which is now uploaded to <http://weilbacher.org/temp/old_18_Packages>.
Most of the old builds on a.m.o are ones that I put there, and I only had Linux builds.  ;)
(In reply to comment #35)
> (In reply to comment #33)
> > And you also see that it is fast in the Mozilla 1.8b1 release and slow in the
> > SeaMonkey 1.1a release? Hmm, there don't seem to be Windows builds on
> 
> Args, I meant the SeaMonkey 1.0a release not 1.1a...
> 
> 
> Well, for Ray to test, I built 2005022300 which is now uploaded to
> <http://weilbacher.org/temp/old_18_Packages>.
> 

1.oa are all very bad regarding this issue.
(In reply to comment #35)
> 
> Well, for Ray to test, I built 2005022300 which is now uploaded to
> <http://weilbacher.org/temp/old_18_Packages>.
> 
050223 has the delay.

The last two you provided, When I run email:

"Could not initialize the browsers security component..........."
And then
"You cannot connect to pop.gmail.com - or bugzilla - because SLL is disabled."

Is this something you didn't bother to turn on in the test builds or is it a problem?


Don't know. I did switch --enable-crypto on spcifically. Perhaps that was a bug at the time or I have forgotten some trick from two years ago. Anyway, they are just there for testing the MailNews slowness, so I don't think we need to care.

Two more builds are up on <http://weilbacher.org/temp/old_18_Packages> for testing: 2005021800 and 2005022000.
(In reply to comment #39)
 
> Two more builds are up on <http://weilbacher.org/temp/old_18_Packages> for
> testing: 2005021800 and 2005022000.
> 
20050218 does not have delay
20050220 does have delay

You said of 1.8b1, that identifies it self as 20050301:
MOZILLA_1_8b1_RELEASE tag was actually created on "2005/02/18 04:41:30"

1.8b1 has a different MOZILLA.EXE than your 20050218.


(In reply to comment #40)
> (In reply to comment #39)
> 
> > Two more builds are up on <http://weilbacher.org/temp/old_18_Packages> for
> > testing: 2005021800 and 2005022000.
> > 
> 20050218 does not have delay
> 20050220 does have delay

OK, we are getting closer. But that still restricts us to the checkin range
http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=MozillaTinderboxAll&branch=HEAD&date=explicit&mindate=2005-02-18+04%3A00&maxdate=2005-02-20+00%3A00&cvsroot=%2Fcvsroot
Wow, a _lot_ off changes in those two days... Any suggestions from anyone on which patches I should try or which date+time I should give for checkouts?

> You said of 1.8b1, that identifies it self as 20050301:
> MOZILLA_1_8b1_RELEASE tag was actually created on "2005/02/18 04:41:30"
> 
> 1.8b1 has a different MOZILLA.EXE than your 20050218.

Well, mine was 2005/02/18 00:00:00 which gives the beta release build almost five hours of code changes. Although I am a bit puzzled, because I don't see any changes.

But I also found a difference in build setup (I used --enable-shared --disable-static which in those days wasn't used) that might explain this and perhaps also the PSM error message.
I have not added any extensions to the versions I have tested.  However I have Java  1.4.2_05 and Flash 7r63 plugins.  Removing the plugins removes the delay.
(In reply to comment #42)
> I have not added any extensions to the versions I have tested.  However I have
> Java  1.4.2_05 and Flash 7r63 plugins.  Removing the plugins removes the delay.
> 
Putting back either Java or Flash brings back the delay.


Ray, great to see that you have found a workaround.
I cannot see any real connection between MailNews and plugins. Well, unless you are also getting emails that embed Flash and Java stuff _and_ you have Preferences -> Advanced -> Scripts & Plug-ins -> Enable Plug-ins for [X] Mail & Newsgroups activated (which I would strongly suggest you do not, especially not with the OS/2 versions of the plugins that are old and hence full of security holes). I also do not see a connection between the plugins in the code changes in the timespan I posted in comment 41.

Btw, no need to email me privately after adding comments to bug reports on which I am CCd...
I don't think of it as a work around, but rather running SM crippled.  But with no Java and Flash development for OS/2 they are getting less useful anyway.

I do not have Java script enabled for Mail.

I do all testing on dfsee-support@yahoogroups.com because I can be sure it is text only.

Can we consider this bug confirmed, and would that be of any value?



I wasn't talking about JavaScript enabled for Mail but Plugins enabled for mail.
And no, setting this bug to confirmed will not really help any resolution.
(In reply to comment #46)
> I wasn't talking about JavaScript enabled for Mail but Plugins enabled for
> mail.

I had never noticed that setting before.  It was checked, I unchecked it, closed and opened, no change in delay.


(In reply to comment #47)
> (In reply to comment #46)
> > I wasn't talking about JavaScript enabled for Mail but Plugins enabled for
> > mail.

Now I cannot find the plugins for mail switch.  Where is it?
> 
> I had never noticed that setting before.  It was checked, I unchecked it,
> closed and opened, no change in delay.
> 
It seems I can now have usable email or browser plugins, not both.  Is that as good as it is going to get?


(In reply to comment #48)

> It seems I can now have usable email or browser plugins, not both.  Is that as
> good as it is going to get?

Don't know, could be. :-(

I can produce some more builds although uploading them for you is a pain. To summarize, we have found out that

   20050218 00:00          works
   1.8b1 (=20050218 04:41) works
   20050220                doesn't work

Right? So I could try building "20050218 06:00" and "20050218 12:00" next. Let's just ignore the difference in mozilla.exe for now.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: regression
There seems to be a mismatch between the file revisions I get with a dated checkout and the ones that I see listed in bonsai (that so far I have used as a reference). I.e. the checkin of mozilla/layout/style/CVS/Entries version 3.187 is listed on bonsai on 2005-02-17 22:13 while I see it happening when checking out using "20050218 06:14", with a difference of 8 hours.

So the table from comment 49 is really

   MOZ_CO_DATE       source date          result
   20050218 00:00    20050217 16:00       works
   -- (1.8b1)        20050218 04:41       works
   20050220 00:00    20050219 16:00       doesn't work

I am now uploading mozilla-i386-pc-os2-20050218_1200.zip and mozilla-i386-pc-os2-20050218_1800.zip to http://temp.weilbacher.org/old_18_packages/
which in the above table would be

   20050218 12:00    20050218 04:00       ?????
   20050218 18:00    20050218 10:00       ?????

where Ray would have to fill in the question marks...

>    20050218 12:00    20050218 04:00       no delay
>    20050218 18:00    20050218 10:00       zip file incomplete?
> 
> where Ray would have to fill in the question marks...
> 

(In reply to comment #51)
> >    20050218 18:00    20050218 10:00       zip file incomplete?

Args, please try again...
> > >    20050218 18:00    20050218 10:00       no delay

Since so much has changed here, I just ran 20050220 the same way I did the new 218s.  Does have delay.
OK, now there is mozilla-i386-pc-os2-20050219_0600.zip in the same dir which is
   MOZ_CO_DATE       source date
   20050219 06:00    20050218 22:00
(In reply to comment #54)
> OK, now there is mozilla-i386-pc-os2-20050219_0600.zip in the same dir which is
>    MOZ_CO_DATE       source date
>    20050219 06:00    20050218 22:00 Delay=Yes
> 

To RAY DAVISON(bug opener):
Does "plain text mail and string like http://... in it (string which will be  linklified)" involved in your case?
If yes, see Bug 383716 which is a report when more than 7600 "http://www..."s in a plain text mail.
WADA, are you saying that bug 383716 also regressed more than two years ago between 2005-02-18 10:00 and 2005-02-18 22:00? I don't see any evidence for that and in fact I am missing any relation to the problem discussed here or http text in emails...
Ray, a final full package is now at
http://temp.weilbacher.org/old_18_packages/mozilla-i386-pc-os2-20050219_0000.zip
This should finally restrict it to one or two relevant code changes.
(In reply to comment #58)
> Ray, a final full package is now at
> http://temp.weilbacher.org/old_18_packages/mozilla-i386-pc-os2-20050219_0000.zip
> This should finally restrict it to one or two relevant code changes.
> 
219_0000 Delay = No

Great, last known working code date 2005-02-18 14:00, first known broken 2005-02-18 22:00. Bonsai search
http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=SeaMonkeyAll&branch=HEAD&sortby=Date&date=explicit&mindate=2005-02-18+14%3A00&maxdate=2005-02-18+22%3A00&cvsroot=%2Fcvsroot

I would say that the two check-ins for bug 273785 look like a hot candidate. I still don't see how they can influence MailNews but as plugins seem to trigger this somehow, I don't see anything else. I will build and upload a docshell DLL with and without the change to confirm and add in some debug info around the changes.

Ray, I didn't think about it before, but can you send me the contents of your plugins directory in a ZIP file (by private email, I hope it's not too large)? If I could reproduce the problem with those it might be easier to debug the problem...
OK, two DLLs for testing:
   http://temp.weilbacher.org/old_18_packages/docshell_20050219_0000_printfs.zip
   http://temp.weilbacher.org/old_18_packages/docshell_20050219_0800_printfs.zip
They should be used together with the Mozilla from the last (20050219_0000) package. The idea is to start Mozilla from the command line like this:

unzip <SOMEPATH>\docshell_20050219_0000_printfs.zip -d components
mozilla > run0000.log 2>&1

unzip <SOMEPATH>\docshell_20050219_0800_printfs.zip -d components
mozilla > run0800.log 2>&1

While it is running test for the delay (the first should be OK, the second one delayed). The debug output contains some calls to clock() so this should tell us some relative timing on your machine, too, to see if the delay is actually in the code that I think it is. Send the debug output to me, please.

I am not really sure yet what this means for the current versions because the code has changed quite a bit, but let's see about that later...
I too have been suffering from this bug. Yesterday I removed all plugins except npnulos2.dll and the problem was still there. Removed npulos2 and the problem cleared up. Putting all the plugins back and newsgroups are still responsive with no 100% CPU here.
OK, so the logs that Ray sent me show that it takes several seconds for each message to display (up to 10s or so). This is because nsDSURIContentListener::CanHandleContent() gets called four times for each message (3 times for message/rfc822, 1 for text/html). For each of the calls about message/rfc822 the plugins are reloaded which for some reason takes a few seconds (about 3s?) on Ray's machine.

I checked that (for my collection of Bugzilla mails) in a SeaMonkey from 1.8 branch CanHandleContent() is actually called 5 times and again for three of them (type message/rfc822) the plugins are reloaded.

bzbarsky: as all this happens in your code and your fixes in bug 273785 originally caused this problem, it would be great if you could comment.
Assignee: mail → nobody
Severity: minor → normal
Component: MailNews: Main Mail Window → Embedding: Docshell
Product: Mozilla Application Suite → Core
QA Contact: docshell
Version: unspecified → Trunk
> This is because nsDSURIContentListener::CanHandleContent() gets called four
> times for each message

Uh... Odd.  But ok...

> For each of the calls about message/rfc822 the plugins are reloaded

Yep.  Need that to deal with the silly plugin arch we have where we don't know what's there...  Note that we _do_ cache the results in large part, so it most certainly shouldn't be taking 3 seconds.  That's the thing to look into, in addition to why CanHandleContent() is called so many times.

But at a guess, this is a bug in the OS2-specific (or windows-and-OS2-specific) plugin code
(In reply to comment #64)
> > For each of the calls about message/rfc822 the plugins are reloaded
> 
> Yep.  Need that to deal with the silly plugin arch we have where we don't know
> what's there...

Are there any plans to improve this "silly plugin arch" to be less silly in the near future?

> Note that we _do_ cache the results in large part, so it most
> certainly shouldn't be taking 3 seconds.  That's the thing to look into, in
> addition to why CanHandleContent() is called so many times.

That is due to nsDocumentOpenInfo::DispatchContent() that calls TryContentListener multiple times if it wasn't found the first time. For some reason that I don't understand at all I also see the second call of CanHandleContent() to be called through JS in the call stack, but still originating in line 488 in nsURILoader.cpp (just as the first call).

> But at a guess, this is a bug in the OS2-specific (or windows-and-OS2-specific)
> plugin code

I suspect that one of Ray's (and now Dave's) plugin DLLs is broken in some way which delays the reload, even though I cannot reproduce that with his plugins on my machine. Ray, can you set the environment variables
   set NSPR_LOG_MODULES=Plugin:5,PluginNPP:5,PluginNPN:5
   set NSPR_LOG_FILE=c:\plugin.log
(or any other path), then retest the version with the delay, and send me the plugin.log file? That should contain some more info on why reloading the list of plugins is slow on your machine. Dave, you too. :-)
> Are there any plans

I don't know.  I doubt it.

> calls TryContentListener multiple times if it wasn't found the first time

It should be calling it on different listeners, though.  And most of them should not really be docshells...  I'm not sure I follow the rest of your paragraph there.
> I'm not sure I follow the rest of your paragraph there.

Perhaps it becomes a bit clearer what I meant with these call stacks that I recorded with a breakpoint set at the beginning of CanHandleContent() in SeaMonkey 1.1.2 on OS/2. The second call stack is the one that confuses me...
That sounds like a JS impl of nsIURIContentListener::isPreferred or something.  You could verify by looking at the JS stack.  When stopped at the breakpoint, make sure symbols are loaded for xpconnect and call the DumpJSStack() function.

I wonder what the deal is with the initial claim that that type is not supported, though...
> I suspect that one of Ray's (and now Dave's) plugin DLLs is broken in some way
> which delays the reload, even though I cannot reproduce that with his plugins
> on my machine. 

I believe I am now using all new hardware and software for testing.

> Ray, can you set the environment variables
>    set NSPR_LOG_MODULES=Plugin:5,PluginNPP:5,PluginNPN:5
>    set NSPR_LOG_FILE=c:\plugin.log
> (or any other path), then retest the version with the delay, and send me the
> plugin.log file? That should contain some more info on why reloading the list
> of plugins is slow on your machine. Dave, you too. :-)
> 
Log sent.


> Log sent.
> 
It was made with Moz 219-0600.  
AMD 64 3K
Delay is seven seconds reading dfsee-support@yahoogroups.com


(In reply to comment #68)
> I wonder what the deal is with the initial claim that that type is not
> supported, though...

I didn't find any call to AddCategoryManager that would add message/rfc822 to the supported types... Should that happen somewhere in mail or mailnews?
My problem seems to have been caused by a misconfigured flash issue. I had copies of Flash in %MOZ_PLUGIN_PATH% and %MOZILLA_HOME%\mozilla\plugins and while Seamonkey was using the one in %MOZ_PLUGIN_PATH% the registry was pointing at the other one. Removing the one in %MOZILLA_HOME% and rerunning flashinst to setup the registry has Flash working and no unusual CPU useage when using the newsgroups.
Attached patch possible fixSplinter Review
OK, many thanks for that hint. I think then a patch like this should help. At least here I now only see one call to CanHandleContent() for message/rc822 and it doesn't try to reload the plugins.
Assignee: nobody → mozilla
Status: NEW → ASSIGNED
Attachment #272332 - Flags: superreview?(bzbarsky)
Attachment #272332 - Flags: review?(bzbarsky)
Comment on attachment 272332 [details] [diff] [review]
possible fix

>+        *aDesiredContentType = (char *)malloc(4 * sizeof(char));
>+        if (*aCanHandleContent)
>+            strcpy(*aDesiredContentType, "*/*");
I don't know anything about the bug but I do know that this is not how you return a string from an XPCOM method. You probably want either:
if (*aCanHandleContent)
    *aDesiredContentType = ToNewCString(NS_LITERAL_CSTRING("*/*"));
or
if (*aCanHandleContent) {
    *aDesiredContentType = NS_Alloc(4);
    if (*aDesiredContentType)
        strcpy(*aDesiredContentType, "*/*");
}
Comment on attachment 272332 [details] [diff] [review]
possible fix

No, this is wrong.  Non-docshell handlers for the exact type should take priority over stream converters.  See the code in nsDocumentOpenInfo::Dispatch.
Attachment #272332 - Flags: superreview?(bzbarsky)
Attachment #272332 - Flags: superreview-
Attachment #272332 - Flags: review?(bzbarsky)
Attachment #272332 - Flags: review-
(In reply to comment #75)
> I don't know anything about the bug but I do know that this is not how you
> return a string from an XPCOM method. You probably want either:
> if (*aCanHandleContent)
>     *aDesiredContentType = ToNewCString(NS_LITERAL_CSTRING("*/*"));

Hehe, I was trying to remember some XPCOM methods for this, but as the returned "string" was just a normal char * I didn't bother to look it up. This is of course a lot better. :-)


(In reply to comment #76)
> (From update of attachment 272332 [details] [diff] [review])
> No, this is wrong.  Non-docshell handlers for the exact type should take
> priority over stream converters.  See the code in nsDocumentOpenInfo::Dispatch.

I don't understand, nsDocumentOpenInfo::Dispatch is what asks the CanHandleContent() function that I tried to patch (via TryContentListener). Why should I duplicate code from the caller to the called function? Or do you mean I should add it to Dispatch instead? Or somewhere else?
Dispatch() will do this code (stream converters) itself.  After it tries all possible handlers for the actual content type we're seeing.  Your patch would short-circuit these other handlers, which is not desirable.
Not to mention that outputting */* means you promise to handle whatever the type is converted to.  Which you can't do, of course.
(In reply to comment #78)
> Dispatch() will do this code (stream converters) itself.  After it tries all
> possible handlers for the actual content type we're seeing.  Your patch would
> short-circuit these other handlers, which is not desirable.

Well, there has to be some short-circuit if we want to keep it from searching the plugins first. Where do you suggest to add that?
We don't in fact want to keep it from searching the plugins.  And in any case, the plugin searching happens in IsTypeSupported(), which you're still calling.

The real question is why there are so many calls to nsDSURIListener::CanHandleContent, imo.  I'm pretty sure the answer to this question lies in the mailnews UI, not in the docshell code.

And then again, once the plugin business is fixed this stopped being a problem, so I'm not sure we want to worry about it.
(In reply to comment #79)
> Not to mention that outputting */* means you promise to handle whatever the
> type is converted to.  Which you can't do, of course.

I thought it would mean that there is a handler available to convert it to something that we can handle. Btw, the same would work using text/html instead of */*. ;-)


(In reply to comment #81)
> We don't in fact want to keep it from searching the plugins.  And in any case,
> the plugin searching happens in IsTypeSupported(), which you're still calling.

The difference with the patch is that I am only calling IsTypeSupported() once per message instead of three times. ;-) But why does IsTypeSupported() do the plugin searching anyway? Could that not be moved to Dispatch()? Somewhere near the end (before the "Sixth step") as a last resort measure before calling helper apps? IIUC the "Fifth step" handles the converters so that should help with the present message/rfc822 problem.
> The difference with the patch is that I am only calling IsTypeSupported() once

Which is great, but the real question is why there are three calls to start with.   I'd still like to see stacks for it.  The second one could use a JS stack, for example. Then you should ask the mailnews folks what the deal with nsMsgWindow is.

> But why does IsTypeSupported() do the plugin searching anyway?

Because that's the only way it can answer the question it's being asked?

We could search plugins in Dispatch(), I suppose (much earlier than you propose, of course; between steps 1 ans 2), but you're assuming no one else will ever call IsTypeSupported().  Not a great assumption.

In brief, the issues I see here are:

1)  The plugin code's rescanning has issues (a rescan when no files have changed should be really fast).
2)  Mailnews has a complicated dispatch setup of some sort that I'm not following.

We could help work around #1 by setting some sort of flags to avoid more than one rescan per Dispatch(), but it'd be pretty painful and just be a workaround for the real issues above.
I won't have time to go on debugging this and study the code any time soon...
Assignee: mozilla → nobody
Status: ASSIGNED → NEW
(In reply to comment #84)
> I won't have time to go on debugging this and study the code any time soon...
> 
Will you provide "Job posting" of the skills required and the area to be worked on?  I'll post a help request on the lists that I am aware of.

Is this an OS/2 only problem; I don't see it on W2K.

Would fixing this likely break something else?
can you --enable-extensions=venkman, and use venkman? it should be possible for you to enable chrome debugging and set breakpoints for this. alternatively, w/ debug builds, DumpJSStack can work (just be careful not to call it while you hold locks it needs...). If you're desperate and have a decent debugger, you can manually crawl through cx->fp->(pc,script->(main,lineno,filename),down*) or something like that (examples can be found by reading bugs I've touched which mention those variables).
I had this problem last week when I switched from 1.1.18 to 2.0. After a day of trying to get used to the slow transfer between mail messages displayed, I went back to 1.1.8. Today I read a post from Ray Davison on news:mozilla.dev.ports.os2 and tried 2.0 again after first removing npnulos2.dll from the plugins directory. That seems to be the cure I need to use 2.0 full time.
removing npnulos2.dll from the plugins directory seems to make 2X equal to 1118.  That is, it is usable if there are no plugins.
OS/2 is no longer a supported platform.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: