Closed Bug 199443 Opened 21 years ago Closed 21 years ago

leaking GDIs when table cell contains an image, and text is received in multiple packets (testcase available)

Categories

(Core Graveyard :: GFX: Win32, defect, P1)

x86
Windows 98
defect

Tracking

(Not tracked)

RESOLVED FIXED
mozilla1.4final

People

(Reporter: bugzilla, Assigned: darin.moz)

References

()

Details

(Keywords: fixed1.4, memory-leak, testcase, Whiteboard: [adt1])

Attachments

(9 files, 1 obsolete file)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.0; en-GB; rv:1.3) Gecko/20030312
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-GB; rv:1.3) Gecko/20030312

Mozilla leaks GDI resources when

A table containing both a PNG image (maybe other types as well) and some text is
loaded, 
AND
The HTML document is delivered such that the whole table is not delivered in one
packet 
AND
The image loads before the table has finished being delivered.

If all three conditions are satisfied, mozilla will leak one GDI handle per image.



Reproducible: Always

Steps to Reproduce:
1. Install testcase files (bug.php and volume.png) on a server you have
high-speed access to (local network is good).
2. Open in mozilla
3. Inspect used GDI handles using task manager

Actual Results:  
GDI handle count rises at the rate of 1 per second (until it reaches 10000 or
so, and mozilla crashes).

Expected Results:  
GDI handle count fluctuates as page is reloaded, but stays stable around the
value  before page was loaded.

The URL will work if you have high-speed access to the server (I.E. if you can
download the image in less than ~8 miliseconds.) This is likely not possible
unless you are on a JANET machine. 

If it doesn't work, please instead download the two attachments and run them on
your own server.

This testcase was derived from the application 'Globecom Jukebox'
(www.globecom.se/jukebox), which causes mozilla to leak 3 handles per reload,
with a default reload time of 30 seconds (which crashes mozilla in just under 28
hours) if the playlist window is left open.
Run on a unix webserver. This will not work on windows, as windows cannot
usleep();
Marking NEW as it's detailed and have a testcase.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: mlk, testcase
Priority: -- → P2
Target Milestone: --- → Future
Wherever I say 'table' in my first comment, I mean 'table cell'. Oops.

This bug does not occur in mozilla 1.2.1 .

When I have the bandwidth I'll binary search for it.
*** Bug 203247 has been marked as a duplicate of this bug. ***
This looks very similar to bug 159298.
I couldn't reproduce bug 159298 on 1.3 release, windows XP, while I can
reproduce this one. Will Holcomb reports the same thing in bug 159298 comment 7 . 

This bug may or may not affect 9x, but it definately affects NT, which bug
159298 doesn't.

It is my opinion that these two bugs may be related, but they are not duplicates.
Well, I filed bug 203247, which got duped to this, and that one does affect
Win9x, so I'll assume this bug affects Win9x and NT/2K, probably XP as well.
Attached file GDI usage tool
GDI usage tool for windows 9x. I'm pretty sure it's public domain. (I just
can't remember where I downloaded it from).
I can now confirm that this bug affects windows 98. 

The program "GDI Usage", which I can't  remember who wrote, or where it came
from :( shows that mozilla 1.3 is leaking bitmap objects under 98, which contain
the same image as the image in the test. I'd previously thought it wasn't
leaking under 98, but it turned out I wasn't using the tool properly.

I can also confirm that it affects XP and 2000 by using the method in 'how to
reproduce'.

It would be nice if someone with windows 95 could test this and see if it leaks
there too.

Also, I'd suggest maybe setting this bug to P1, as a GDI leak is a much more
serious problem under 9x than NT. Whereas under NT a leaking app will eventually
run out of handles and not be able to display properly, under 9x all apps share
the same memory, so a leaking app causes the entire system to run out of handles
and fail to display properly. This is what causes the 'system is dangerously low
in resources, which app do you want to kill' box to appear.
OS: Windows 2000 → Windows 98
I just attached CGIObj.exe to a related bug
(http://bugzilla.mozilla.org/show_bug.cgi?id=204374) that will monitor GDI
object usage under win2k.
I am currently running the testcase and monitoring the GDI objects with 
CGIObj.exe.  It has been running for a good 30 minutes and so for the number of 
bitmaps has been steadily hanging around 17-19.  Maybe this particular testcase 
only works for Windows 95/98, however, I've definately seen some leakage on 
win2k (see bug 204374).

I'll let it run overnight and see if anything leaks!
You shouldn't have to leave it for half an hour - there should be a leak every
time it reloads. When I ran it the results were noticable after 30 seconds or so.

Are you running the testcase on a unix machine? Try setting the value of usleep
higher (On a windows machine, you should be able to use sleep(1), and a reload
time of 10 seconds or so, but it will be very much slower). To reproduce this
bug the image must be fully loaded before the second packet that closes the
table cell is received.

I ran the testcase with the GDIobj program on my laptop (XP pro SP1, mozilla
1.3) and found that the totals given by it do not agree with the totals in
windows task manager. While GDIobj showed the totals to be roughly the same, the
 total for mozilla in device manager increased as long as the testcase was running.

I don't think that the GDIobj program is giving the correct values in this case.
[ugh, i've gotten confused between this bug and bug 199443, i'll copy this
comment here]

Ok, it happened again, this time with build 20030514.  I believe it started to
happen after I was running Fb for a while then launched WinCvs 1.2 (which is a
quite a GDI Object hog).  The object counts according to GDIObj were:

Total  Dc  Region  Bitmap  Palette  Font  Brush  Other
682    17  67      408     14       84    92     0

I let it run a little while longer to take some screenshots of the corruption,
and before I quit Fb the final counts were:

Total  Dc  Region  Bitmap  Palette  Font  Brush  Other
734    18  87      410     15       97    107    0

For the first time, Fb actually crashed while shutting down.  The talkback
incident ID is TB20126994G.

While I still had Fb open, other programs were acting up.  For example, I had
notepad open and all the text was invisible.

I will attach some screenshots of some corrupted web pages (seem like bold and
italic fonts were not showing up) as well as my process list (with GDI Objects
column)

!! note that i attached the screenshots to bug 204374, here are the links !!

Taskmanager durring incident
http://bugzilla.mozilla.org/attachment.cgi?id=123421&action=view

Google groups corruption (text missing, icons missing)
http://bugzilla.mozilla.org/attachment.cgi?id=123422&action=view

Slashdot corruption (text missing)
http://bugzilla.mozilla.org/attachment.cgi?id=123423&action=view

IE got sick as well
http://bugzilla.mozilla.org/attachment.cgi?id=123424&action=view
[another comment I left on bug 204374]

Ok, after running the test case over night (browser running on win2k with the
test script installed on a apache/linux server), there was no leak.  The final
numbers are:

total:  87
dc:      4
region: 25
bitmap: 19
palette: 4
font:    3
brush:  32
other:   0

The GDI Object count from task manager was 92, which is close to the total
GDIObj shows given the +/- 3 bitmap fluctuation.  So it seems like there is no
leak in the build I was testing (20030511).

Also, someone mentioned this site in bug 199443:

http://www.hardwarecentral.com/hardwarecentral/

By browsing through the different pages of this site (not refreshing the same
page over and over again) I can the GDI Object use over 1000 fairly quicky.
GDIobj.exe does not show the same total as windows device manager, and insists
that nothing is amiss when mozilla is in the process of bringing down the system. 

It is my strong belief that objects are being leaked, and gdiobj.exe is not
detecting them.

The testcase should leak 1 handle per reload, so 6 tabs of it open at the same
time should cause windows to crash in under 30 minutes.
Attached file Updated testcase
The number in the cell should change every time it is reloaded.

Connecting through a proxy may cause the page to be delivered as one packet
instead of two, which will prevent the leak from occurring.
This may have sumething to do with the memory cache. 

I disabled the cache (browser.cache.memory.enable=false), and the leaks stopped
happening. They started again with the cache reenabled
(browser.cache.memory.enable=true).

Additionally, mozilla may be leaking memory because of this, but I cannot say
anything very conclusive. Running the test for 10 minutes increased mozilla's
footprint by ~1MB, 249 GDI objects.
Lewis -- I am not denying there is a nasty GDI leak going on, I've witnessed it
myself several times, and have uploaded screenshots of the horror in bug 204374
to demonstrate it.

All the testing I've done I've watched both the object counts on the Task
Manager and GDIObj, and while they're never exactly the same, there is a
correlation between the two (I suspect GDIObj does not count some of the objects
the Task Manager does).  I've never seen a case where the number of objects
being reported by Task Manager is wildly different than GDIObj -- GDIObj is
usually a few less (as I write this, Task Manager says 200, GDIObj says 194). 
This is on win2k -- GDIObj might not work properly with XP.

That being said, I've been running a tweaked version of your test script -- I
increased the delay in the script, added a random number to the end of the
meta-refresh URL to prevent caching, added more content in the table, and
running multiple copies of the script in many tabs.  After having it run for 10
minutes or so, Task Manager says 203, GDIObj says 197 -- looks like 3 objects
were leaked.  Are you experiencing more than this in a similar amount of time?
I take back my criticism of GDIobj. You're right that for sensible amounts of
GDI allocations, it's off from task manager by a constant amount.

Running for approximately three minutes, the mozilla went from 99/86
(taskman/gdiobj) to 323/310 GDI objects. 

Interestingly, though, after a while the GDI leak stopped. I had a look at
about:cache?device=memory , which showed the cache to have allocated 245
entries, and 1mb of storage. Counting the entries by hand, however, showed only
32. This cache usage seems pretty similar to the resources leaked. (As an aside,
the people responsible for the memory cache see nothing wrong with the memory
cache keeping a GDI handle open for each item in the cache, and have WONTFIXed
many bugs about it. They see GDI handles as being akin to malloc()ed heap,
rather than file handles. They don't seem to realise that there is a fixed limit
to the number of GDI handles, and it's shared between all processes).

New procedure to reproduce:

set browser.cache.memory.enable=true
set browser.cache.memory.capacity=65536 (64 Mebibytes)
Run the test case. This should be enough storage to leak 16000 handles, enough
to crash the system.

Workaround:

set browser.cache.memory.enable=false

Given its incredibly antisocial behaviour under windows, this strikes me as no
big loss. Squid can provide a replacement memory cache that doesn't cause the
entire system to lock up.
Summary: GDI leak when table cell contains an image, and text is received in multiple packets (testcase availible) → Memory cache leak causes GDI leak when table cell contains an image, and text is received in multiple packets (testcase availible)
Summary: Memory cache leak causes GDI leak when table cell contains an image, and text is received in multiple packets (testcase availible) → Memory cache leak causes GDI leak when table cell contains an image, and text is received in multiple packets (testcase available)
Screenshots before and after.

Notice that after just three minutes, mozilla has leaked 245 GDI handles, and 1
mebibyte of memory.
Yes, that seems to be the case.  With the memory cache disabled, I can't not
drive the GDI object count over 300 when surfing the hardwarecentral site.  The
reason the hardwarecentral site made the GDI object count grow like crazy is
because it has many of these 8 byte images that get stuck in the memory cache --
after you hit the page once, there are 16 of these in cache.  Refresh and there
are 32 of them in cache.  Refresh again and there are 48 of them.  These images
have an expiration date of 12/31/69 19:00:00 -- should these even be in the
cache at all?  ?  They seem to be sucking up a whole lot of resources.


Just because GDIs are stored in the memory cache, doesn't mean the memory cache
is leaking.  Changing summary.

Perhaps the real question is whether ImageLib should be storing GDIs in the
cache.  Cc'ing jdunn.
Summary: Memory cache leak causes GDI leak when table cell contains an image, and text is received in multiple packets (testcase available) → leaking GDIs when table cell contains an image, and text is received in multiple packets (testcase available)
Quick update.  Still using 20030514 and today I reached an all time high GDI 
Object count of 2237 (so says taskman, gdiobj says 2217, 2003 of those 
are "bitmap").  Was unable to check the memory cache as the browser got itself 
into an unusable state.  Also, it was taking up about 110MB of memory in total.

Perhaps there could be a configured limit of the number of items in the memory 
cache, rather than limtied by size?  This would at least put an upper limit on 
the number of handles the cache could consume.

In reply to Comment #25

Yes, storing GDI handles in the memory cache would not be my first choice design.
I think GDI handles should ideally be in a separate cache (possibly a logical
subset of memory cache) that can adapt to the availability of GDI resources
in a user's system. 
I also like the workaround suggested in comment #26. I suspect it would  provide
an adequate soloution.
Why does the memory cache/[ImageLib using the memory cache] use GDI handles at all?

With the memory cache disabled, mozilla still runs fine. This GDI use is
different from the GDI use necessary to display the page - with the memory cache
disabled pages still display fine, and mozilla still uses large numbers of
handles on pages with lots of images, but the handles are freed immediately upon
closing the page, and aren't leaked.

Why is ImageLib storing GDI objects for things it isn't displaying at the
moment? GDI handles aren't like malloc()ed ram, they're more like file handles.
The operating system can deal with programs allocating more ram than there is
available by terminating processes, but it can't deal with (or even easily
detect) over-use of GDI handles. Open too many file or GDI objects at once and
the system can't allocate any more and it crashes. It's common sense (and
neighbourly behaviour) for a program to open as few handles as possible, and
close them when they are no longer necessary. Can you imagine how useless a
squid would be if it kept an open file handle for every file in its cache?

My only guess to the reasoning behind this behaviour is that there is some
overhead involved in fishing an image out of the cache and GDI-ifying it. 
My gut feeling is that the bulk of the overhead would be in decompressing the
image, rather than sending it to the GDI. If this is the case, the majority of
the savings could be kept by storing the decompressed image in the cache, but
closing its GDI object when it is no longer displayed, and opening a new one the
next time the page is loaded.

My experiences over the past week using no memory cache and storing the disk
cache on a ramdisk haven't shown any significant slowdown over the more
conventional memory cache+disk cache on hard disk arrangement. Your results may
differ if you have a faster connection than a modem, but I'd take slower page
loads over crashing the entire operating system any day.

--
As an aside, even without leaking behaviour you can cause mozilla to crash the
whole system by filling the memory cache with small images. (bug 205893)
I cannot repeat this bug on win98, with build 2003050211.
I tried to repeat it with the
"http://homepages.inf.ed.ac.uk/s9902074/mozilla/bug.php" url.
Is this page displaying the latest test case?
I just updated them.

http://homepages.inf.ed.ac.uk/s9902074/mozilla/bug2.php for the old testcase,

http://homepages.inf.ed.ac.uk/s9902074/mozilla/bug.php for the updated one, with
the timeout set to 80 000 microseconds.

http://homepages.inf.ed.ac.uk/s9902074/mozilla/slowbug.php for 10 seconds per
reload and 8 seconds between packet 1 and packet 2.

These testcases will not work unless you can download the image before the
second packet arrives. If there is a proxy (including any transparent proxy your
ISP has forced on you), it will probably send the whole page as one packet, and
the bug will not manifest itself.

An easy way to see if it's working is to refresh about:cache?device=memory , and
see if the total given for cache entries is increasing while the list of cache
entries is not growing longer.
Blocks: 204374
Flags: blocking1.4?
Keywords: nsbeta1
adt: nsbeta1-
Keywords: nsbeta1nsbeta1-
Maybe I am not understanding what people are seeing, but I removed the
<img> and the <table>, leaving just the php loop with the print(rand())
and I see the GDI objects going through the roof on Win2k's Windows Task Manager

Has anyone tried this?  Maybe it isn't the image causing the leak in this
case but the fact that we are printing out the text.
Does this leak continue to manifest itself when you open a new window and close
the old one?

Removing the image and the table, I cannot reproduce this behaviour.
(Mozilla/5.0 (Windows; U; Windows NT 5.0; en-GB; rv:1.3) Gecko/20030312).
Testing on windows 9x gives pretty damning evidence that it is the image that is
causing the leak - looking through the (global under 9x) GDI bitmap table, there
are several hundred copies of the leaked image, even after opening a new window,
and closing the original one.
I see this GDI leaking when i read threads at the german IT-News site heise.de.
After about half an hour usage Mozilla runs out of resources, sometimes hangs. I
then have to kill Mozilla or even need to reboot. Talkback never came up.
Closing windows brings no resources back, only a restart of mozilla buys me new
surfing time.

e. g. reading this thread
http://www.heise.de/newsticker/foren/go.shtml?read=1&msg_id=3527260&forum_id=43167
reduces GDI by ca. 30%.

These pages at heise.de consist of a static first part with ads and second, 
slower part delivered by a content managment system 

Win 98 SE, DSL (t-online), traceroute says about my network speed:

  1    60 ms    58 ms    58 ms  217.5.98.60
  2    55 ms    56 ms    57 ms  217.237.154.210
  3    61 ms    61 ms    61 ms  F-EA2.F.DE.NET.DTAG.DE [62.154.18.14]
  4    63 ms    63 ms    64 ms  c6.f.de.plusline.net [213.83.45.33]
  5    65 ms    65 ms    63 ms  c22.f.de.plusline.net [213.83.57.53]
  6    63 ms    63 ms    65 ms  heise.de [193.99.144.71]

Some facts about heise.de:
106.669.706 PageImpressions 19.406.459 Visitors (january) 21,4 % Gecko (march)

I can't use Mozilla 1.4 RC1 with this bug, Netscape 7.02 works fine.


A semi-workaround was pointed out on one of the duplicate bugs: change your
memory cache size to something lower.  If this is changed to, for example,
4096K, then Mozilla seems to behave about the same as it did before the memory
cache size was increased recently.  In other words, the leak has been there for
some time, but it's the mem cache change that made it really apparent.
I modified the testcase and removed the image.	With this (bug1.php)
I still bring win98 to its knees in a couple of minutes and I eat
up thousands of GDI's on win2k.

I have put this test internally at http://haathi/bug1.php
(sorry I don't have access to an external PHP equipped server)
As you're running 9x, could you please download the GDI usage tool, and see what
type of resources mozilla is leaking? I suspect that it leaks REGIONs in some
cases, but haven't been able to reproduce the behaviour.



-- brief users' guide to the GDI usage tool --
start it
start mozilla
press 'take snapshot'
run the testcase for a bit
press 'compare'
I looked through mozilla's use of REGIONs and I don't see any obvious problems
unless a destrcutor is not being called... where in the GFX chain might
something break if text is received in multiple packets?
Ok, played some more.  In looking at nsImageWin.cpp I was playing with
mIsOptimized and if/when this is set the mHBitmap gets *cached*
in the nsImageWin object.  So for the heck of it... I disabled it on
win98/95 altogether (not sure if the image > 16M).  I am sure this is
gonna HURT performance but it does stop us from *cache'ing* gdi's.

So in TESTING this with bug1.php I noticed that THIS PATCH made a difference.
Meaning with this patch, bug.php & bug1.php no longer gobble up GDI resources.

Which leads me to believe it isn't the TEXT that is generating GDI's.  Looking
at what else could be generating images all I see is 
-the little *load* icon in the tab 
-the X/stop-load icon in the toolbar.
-horizontal scroll bar at the bottom...
When i visit

http://www.heise.de/newsticker/foren/go.shtml?read=1&msg_id=3651850&forum_id=44043&showthread=-1

and toggle between threaded an non threaded view by clicking "Thread-Anzeige
einblenden" and "Thread-Anzeige ausblenden" after some time GDI Usage shows an
consumption of one Bitmap per click. No horizontal scroll bar involved.

When i follow a thread on that page i lose 3 Bitmaps. 

Surfing heise.de acording to GDIUsage.exe i lose bitmap, font and region
resources, but i don't know how much of that is table-related or more
a general GDI leak.

Is there a build with the patch to disable GDI-cacheing downloadable?
Traced a bit further... What I am seeing is that GIF decoder is creating
the nsImageWin's over and over for the following 3 URI's.  They don't
get deleted (i.e. the mHBitmap is never deleted till we shutdown. 

The URI's are
{"jar:resource:///chrome/modern.jar!/skin/modern/global/scrollbar/slider-hrz.gif"}
{"jar:resource:///chrome/modern.jar!/skin/modern/global/scrollbar/thumb-hrz-grip.gif"}
{"jar:resource:///chrome/modern.jar!/skin/modern/global/scrollbar/btn-rit.gif"}
So it looks like imgLoader is getting called with something like
chrome://global/skin/scrollbar/btn-lft.gif
LoadGroup's request flags of nsIRequest::LOAD_BYPASS_CACHE
so we 'doom' the current cache entry (line 301 imgLoader.cpp)

and we then go on to reload the gif... 

nsGIFDecoder2::HaveDecodedRow, does a do_CreateInstance and
creates the decoder->mImageFrame (line 390 nsGIFDecoder2.cpp)
which creates the nsImageWin.  

Now I need to figure out when nsImageWin's destructor should
be getting called to release the previous version.  Am assuming
it is in the DOOM, but nto sure.
the entry->Doom (imgLoader.cpp line 301) doesn't release 
the image (via calling imgRequest::~imgRequest) because the
request's & cache's->entry mRefCnt==2

NOTE: I have been surfing and this testcase seems to be the only
one so far... that hits this piece of code...  i.e.
      // LOAD_BYPASS_CACHE - Always re-fetch
      if (requestFlags & nsIRequest::LOAD_BYPASS_CACHE) {
        entry->Doom(); // doom this thing.
        entry = nsnull;
        NS_RELEASE(request);
        request = nsnull;
      } 

Each refresh will potentially leak 4 imgRequest's (which is 4 nsImageWin's & 4
GDI's), so maybe this specific test is unique and is NOT a general leak.



The way I'm looking at GDI leaks at the moment is to ignore any uniqueness or
any minor/small perceived improvement. If any leaks that are found get fixed,
then Mozilla can only get better...
this patch seems to do the trick.  the problem: imgRequest::mCacheEntry was
sometimes not getting nulled out.  we'd call Doom on the cache entry, and
orphan the imgRequest, leaving it owning the cache entry with the cache entry
owning it.  classic memory leak.  the solution: make imgRequest release its
reference to mCacheEntry whenever we decide to Doom it.
imgRequest::mCacheEntry may be null in some cases or perhaps even != entry in
others, so this patch is a little bit more careful.  basically, the same patch
though.
Attachment #126312 - Attachment is obsolete: true
-> me
Assignee: kmcclusk → darin
Priority: P2 → P1
Target Milestone: Future → mozilla1.4final
Attachment #126314 - Flags: review?(pavlov)
Attachment #126314 - Flags: superreview?(bzbarsky)
Comment on attachment 126314 [details] [diff] [review]
v1.1 patch: revised slightly

I verified that this patch fixes the problem seen by bug.php.
Tested on both win2k & win98

Thanks for helping with this Darin & co.
Comment on attachment 126314 [details] [diff] [review]
v1.1 patch: revised slightly

Most excellent.  This looks like it should also fix bug 193865 and the bugs it
blocks...
Attachment #126314 - Flags: superreview?(bzbarsky) → superreview+
Attachment #126314 - Flags: review?(pavlov) → review+
Blocks: 193865
Comment on attachment 126314 [details] [diff] [review]
v1.1 patch: revised slightly

seeking drivers approval for 1.4... simple patch for a nasty bug.
Attachment #126314 - Flags: approval1.4?
Comment on attachment 126314 [details] [diff] [review]
v1.1 patch: revised slightly

a=asa (on behalf of drivers) for checkin to the 1.4 branch.
Attachment #126314 - Flags: approval1.4? → approval1.4+
fixed-on-trunk
Status: NEW → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
Woohoo!  How quickly can I get my hands on a build with this fix incorporated? 
If it fixes the overall repainting issues I deal with, I'd gladly make a nightly
build my daily browser versus 1.4RC2.
*** Bug 193865 has been marked as a duplicate of this bug. ***
adt:  nsbeta1+/adt1.  Please land on the branch and the keyword fixed1.4
Keywords: nsbeta1-nsbeta1+
Whiteboard: [adt1]
fixed1.4
Keywords: nsbeta1+fixed1.4, nsbeta1-
Many thanks to all who found, investigated, tested, fixed, and implemented for
this bug (and associated fixed bugs). I use WinXP, so I didn't notice this
problem as much, but I'm sure the Win9x users will be extremely happy.
win2k observations (2003.06.24.08-1.4 commercial build):

a. before loading the test url
(http://homepages.inf.ed.ac.uk/s9902074/mozilla/bug.php), GDI's were at 170. i
was looking at this page at the time (no multiple tabbed browsers).

b. while loading the test url, GDI's were at 158-160. i kept it at that url for
about 1min.

c. leaving the test url (returning to this page), GDI's went back up to 175.

d. i opened Preferences (GDI's went up to 225) and cleared the cache. after
closing the Preferences window, GDI's went to 188.

e. i opened up 7 tabs (tabbed browsers) to these sites: espn.com, cnn.com,
news.bbc.co.uk, wired.com, apple.com, sfgate.com, aol.com. after the pages
finished loading, GDI's was at 776. after i close all the tabs (only espn.com
remaining), GDI's fell back to about 550.

will test win98 in a bit...
using the gdi tool attached here, here are my observations with the same build
on win98se (reporting *total* GDI --i added all the fields, if that's correct,
since there wasn't a "total" field in the tool itself):

a. before loading the test url
(http://homepages.inf.ed.ac.uk/s9902074/mozilla/bug.php), GDI's were at 422. i
was looking at this page at the time (no multiple tabbed browsers).

b. while loading the test url, GDI's were at 415. i kept it at that url for
about 1min.

c. leaving the test url (returning to this page), GDI's went back up to 427.

d. i opened Preferences (GDI's went up to 457) and cleared the cache. after
closing the Preferences window, GDI's went to 435.

e. i opened up 7 tabs (tabbed browsers) to these sites: espn.com, cnn.com,
news.bbc.co.uk, wired.com, apple.com, sfgate.com, aol.com. after the pages
finished loading, GDI's was at 1012. after i close all the tabs (only espn.com
remaining), GDI's fell back to about 856.
I'm not sure if it's due to this bug or one of the other GDI resource bugs.
Using today's (20030624) build of Mozilla Firebird, after surfing for an hour of
so on Win98SE, my GDI resources had gone from 67% to about 40%.  It seems to me
that there is still some type of GDI leak happening.
Comment #60 - don't forget that Mozilla will NOT be the only application on your
machine that may be leaking GDI resources.

Comment #59 and #58 - Good work. Looks like Prefs are leaking a bit, and I don't
recall a bug on that one. As for tabs, I know there is a bug for that one,
although I didn't realise it was that bad a leak.

May I suggest that further discussion be moved to Bug #204374 (and I think the
info on GDI numbers should be copied over as well).
comment 60 :
If you suspect a memory leak in 9x, download the "GDI usage tool". Use its
browser to look on any resource types that have a suspiciously large allocation.
If there are multiple copies of the same thing, this is likely to be a leak, and
(IIRC) the GDI usage tool can tell you which application owns the handle (and
therefore is the one causing the resource leak). If it can't do this (I honestly
can't remember), the contents of the resource can still give you a clue.

Example:

   System resources are 40% or so.
Open GDI usage tool.
   It gives a total of 4000 images allocated
Browse images
   Lots of them seem to be an image off a website opened in mozilla - mozilla
guilty!
   Lots of them seem to be part of a shareware program with a 133+ interface -
mozilla innocent

comment 58, comment 59:
As you have an @netscape.com email address, I'm guessing you're in the US? If
this is the case (or, in fact, if you're anywhere but a British university), the
testcase may not work properly (and therefore not show up the bug even on a
build where it isn't fixed yet). I recommend testing a build known to be
vulnerable (any of the mozillas between (about) 1.3 release and 1.4 rc2) to
confirm that the testcase works. If not, you'll have to find a server closer to
your machine. There seems to be an internal copy of the test case described in
comment 36, which may help if this is the case.

Apologies if you already knew this.

Might I also suggest a few final tests: 
open a new window, close the old one (with the espn tab) (or open a new tab and
close the old one), and then check the GDIs. This should eliminate any
contribution from content being currently displayed.
lewis: note that the memory cache holds references to GDI objects.  until the
memory cache fills up, the number of GDI objects in use by mozilla will not go
down.  you can test this by opening a window with espn.com, then closing the
window, and finally clearing the cache.  if after these steps the GDI object
count does not go down, then there is still a serious bug.  otherwise, the only
"bug" is maybe that the cache should not hold onto so many GDI objects... or at
least that the cache should maybe limit the number of GDI objects it owns.
Daris, why is cache keeping GDI objects around?. I'd trade stability by speed :-)
The memory cache is used to store decoded images
with each image we store, we also store the HBITMAP (GDI resource)
for performance reasons.  If you go to a site like espn, just the
main page will cause us to cache 50images (50 gdi resources).
Now because the memory cache maximum size is large (in most cases)
we will continue to cache images (and gdi's) till we go over the
storage size at which point we will probably remain constant.

If you have 500 images in your cache at that point, we hold onto
500 gdi resources.  

So while it appears to be leaking, it may or may not be leaking.
We know about the GDI resources we are holding in the cache.

This bug (and its testcase) showed us a leak that we actually had 
and we fixed it.
Jim, I see the point, but the problem is that in Win9x, GDI resources are a
fairly scarcy resource (¿64Kbytes?). With current approach, opening a pair of
"heavy" web pages can kill my OS, eating GDI as if there was no tomorrow.

I'd rather prefer a slower performance if I'd could navigate without keeping an
eye in the "GDI resource meter" :-(.

By the way, how much performance impact would be have without GDI caching?.
Anybody has numbers?.
we are looking at the #'s and are considering options.

One suggestion (of mine) is to lower memory cache size.
Right now mine (on win98) is set to 50Meg... but if I was
browsing instead of debugging, I would probably keep it at
10Meg or less

Also I would *love* it if I could get a GDI usage tool that
worked with my version of win98.  I don't have sp2 (long story)
specifically I am trying to find out what is leaking via the
testcase in bug 159298
Darin: I'm not certain on this, but wouldn't clearing the cache clear out all
the leaks too? When I did the test a bit, then cleared the cache by switching it
off and on again (Is there a better way of doing this now the clear cache button
has been removed), mozilla's GDI usage fell sharply. Is there a better way to
find the leaked resources than manually counting the entries the cache thinks
it's storing, and subtracting this number from the GDIs mozilla is using?

In my opinion, the memory cache should only be storing GDIs for images that are
being displayed at the moment, and GDIs should be looked up by their image,
rather than URL, so that 2000 identical transparent GIFs only need one handle.
For why indexing on the URL is a bad idea, have a look at bug 205893 . 

Possibly a separate cache for GDI handles may be necessary? Still keep the image
data in the memory cache, indexed by URL, but have a table of GDI handles, with
some hash of the image data, width, height, etc. Each memory cache entry,
instead of having its own GDI handle, has a pointer to one of the GDI handles in
the table. Any number of GIFs of the same colour would take only one GDI IMAGE.
> but wouldn't clearing the cache clear out all the leaks too?

if clearing the cache frees gdi resources, they were obviously not leaked.

"Leak" means that mozilla has no reference to them and never frees them.

Not necessarily - what happens here (in my experience) is that the cache
capacity becomes fully taken up with images that are not referenced anywhere, so
the legitimate entries can't fit in, and the cache can't automatically clear
them out. I'd call this a leak, even if clearing the cache manually clears
everything (leaked or legitimate) from the cache. 

Which I'm not now sure it does, as toggling browser.cache.memory.enable doesn't
actually clear the cache. How does one clear the memory cache on builds without
the 'clear memory cache' button?
The prefbar - http://www.xulplanet.com/downloads/prefbar/ - has a clear memory
cache button.
Forget everything I said in comment 70 - I apologise for wasting everyone's time.

In builds without a 'clear memory cache' button, the 'clear cache' button clears
both caches, not just the disk cache.

Clearing the memory cache does not get rid of the leaked entries.
mozilla1.4 shipped. unsetting blocking1.4 request.
Flags: blocking1.4?
Flags: blocking1.4.x?
Flags: blocking1.4.x?
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: