I just donwloaded and started running M11 under Win95 OSR2. After my system crashed once, I more carefullly watched the effects of the Mozilla browser. Before I ran the executable my system had the system, user, and GDI resource levels above 90% (I have 128MB RAM). After I started M11 and proceeded to browse the web the resource wnt steadily down. After going from MozillaZine to CNN to a CNN story to Slashdot and finally to bugzilla my resources had plummetted to: System: 31% User: 80% Gdi: 30% I then terminated (through close button) M11 before my system locked up. It was getting rather unresponsive. Upon exiting my resource levels returned a little, but not completely. They returned to: System: 82% User: 82% GDI: 89% The progressive browsing of webpages seems to slowly munch memory in M11. It is unusable after a good 3 minutes of web browsing.
I also see this problem on Win98 (also with 128MB RAM) using the 11-19-09 1999 build. I watched the Resource Meter as the pages were loading and noticed that the GDI resources didn't go down steadily. As Slashdot was loading, for example, GDI resources went from 55 to 57 to 52. Apparently it is freeing some GDI resources in some cases, but then uses more than it frees when it loads the next page.
I am seeing some leaking of resources on both WIN95 and WIN98 with M11 when going between slashdot and cnn, but not as a dramatic as described in this bug. We need to narrow down what element or elements in the page result in the leak. It seems to be specific to these pages. The standard test0.html - test15.html do not leak. Many other web sites don't leak as well.
I must disagree with the last assessment of the problem. It quite a dramatic leak for me. I must also disagree with the statement that it does not leak with the standard test0-test15. I was not able to see any resource leaks for tests0-13, but test14 and test15 (XML sorting and xml something) did indeed leak on my system (again M11 Win05 OSR2). I wonder if perhaps the leak is in the network code since test 0-13 reside locally on the filesystem while tests 14 and 15 redirect to pages on mozilla.org. Is it a coincidence that the two tests that use the network leak and the 14 that don't use it don't leak?
Summary: Major memory leaking in M11 under W95 OSR2 → [DOGFOOD]Major memory leaking in M11 under W95 OSR2
This memory leak problem is causing problems QA test script not to run also. We need this investigated and fixed ASAP. Gerardo, please provde more data. Do your tests run ok in win98 and winNT? Is this really ONLY Win95? Adding several folks to cc to help out. PDT+
I am building with purify enabled to try and track down the resource leakage.
After talking with troy and beard it sounds like the problem is we may not be releasing nsIImages. If this is true, the nsIImages are holding references to gdi objects which are never released.
I know that neeti and smfr were poking around some of the imglib leaks...
I spoke with kevin/don -- they think they can nail this by 12/3, but that's being very aggressive in my opinion.
Whiteboard: [PDT+] → [PDT+] 12/3
I have located and fixed 4 places in the XPCODE where we were leaking nsIImages. The nsIImages hold a handle to a HBITMAP which was never released, because it's containg nsIImage was never released. As the result of this leak we were never releasing any image that appeared within the content area. My fix, reduced the number of nsIImage leaks dramatically, but there are still a few nsIImage leaks that need to be tracked down. In addition, as the result of doing a code inspection in nsFontMetricsWin it looks like we may be leaking HFONT handles in nsFontMetricsWin.
Some additional detail on the resource leak fixes. The biggest leak was caused by a missing NS_RELEASE of an nsIImage in /layout/html/base/src/nsImageFrame.cpp nsImageFrame::Paint Other areas with refcount problems which caused nsIImage leaks were: painting a bullet with an image /layout/html/base/src/nsBulletFrame.cpp nsBulletFrame::Paint tiling a background image /layout/html/style/src/nsCSSRendering.cpp nsCSSRendering::PaintBackground painting a title button /layout/html/xul/base/src/nsTitledButtonFrame.cpp Each leaked nsIImage on WIN32 held a handle to a HBITMAP which was never released. I still need to test this on WIN95 to see how much it improves the resource leak problem.
With these change on Mac, I finally see image data being freed, for the first time. Yay.
Kevin: the fixes look good. However, I wonder why you didn't use nsCOMPtr to fix the leaks. Use of nsCOMPtr originally would have prevented these leaks from ever happening.
Could this be the cause of bug 20416?
Simon: I think the original code was done before nsCOMPtr existed. Your right it probably would have been a good idea to switch it over to use nsCOMPTR. At the time I was trying to do minimal changes to the source code in these areas since I'm not the owner, but it does make sense to go back and switch the code to use nsCOMPtr to prevent future leakage.
Using purify I noticed that we are leaking a lot of HRGN (Regions). We are leaking them because Init is called multiple times on the same instance of a nsRegionWin. The fix is easy, but I need to determine if it is proper to be calling Init multiple times on the same region.
I don't know if this helps or hinders, but - After seeing the code changes commited for some of the release problems Kevin found, I tried the 12-03-99 nightly build to see if I could notice any improvements. Unfortunately, that particular build is still leaking GDI resources like a sieve. I started at 90% GDI (again 128MB ram). I opened mozilla and got mozillazine.org, then I went to www.slashdot.org, and finally www.cnn.com. Before CNN's homepage finished loading my GDI resources were down to 20% and they system was sluggish trying to shutdown mozilla. The bulk of the leak appears to be intact. :(
Fix bug where we were leaking a WIN32 region handle (HRGN) each time a new page was loaded. Modified nsRegionWin Init so it can be called multiple times on the same instance without leaking HRGN handles. The view manager re-uses nsIRegions in nsViewManager::ProcessPendingUpdates(nsIView* aView) by calling Init on an existing instance. Mac and Linux already handled the multiple Init on same instance circumstance.
Automation continues to stop on Win 95, every 30~40 test cases, because system resources are down to 0%.
I've tracked down the Image leaking problem to the EventStateManager. Just to recap the behavior I was seeing: I go to a page that has a hyperlink on it and click on the link to load a new page. On application exit a PresContext and EventStateManager are leaked. The PresContext holds on to an image group, the image group holds on the ImageManager and so the ImageManager never releases it's cache of images. The problem is the strong reference to gCurrentlyFocusedPresContext in nsEventStateManager.cpp. The PresContext pointed to by the gCurrentlyFocusedPresContext never gets destroyed because it is released only when the last event state manager is destroyed. The last event state manager doesn't get released because this PresContext does not get destroyed. You get the picture. We have a ownership cycle between EventStateManager and the PresContext caused by the gCurrentlyFocusedPresContext. In my local source I changed gCurrentlyFocusedPresContext to be a weak reference and everything gets released on exit. There is still a question as to how we should we solve this problem. I assume we had a strong reference to gCurrentlyFocusedPresContext because the event state manager was living longer than the PresContext so simply changing it to a weak reference is not a good idea. Once this problem is solved we should see the resources returned upon application exit. As pages are visited we are caching the images so the available resources will go down as more pages are visited and then level off when the image cache has been filled. We may need to shrink the size of the Image cache if we are running out of resources on WIN95/WIN98 within a single run of mozilla. CC'ing joki
CC'ing sarri since he knows about gCurrentlyFocusedPresContext. I temporarly reduced the image cache to 100K instead of 2Meg and I was able to see alot of images where being destroyed as I browsed cnn, slashdot, mozillazine etc. This indicates that the image cache is being emptied when it becomes full, which means the refcounting on nsIImages must be correct now. However, I think we need to reduce the current 2Meg cache to 1Meg to prevent WIN95/98 resources from dropping too low during browsing.
Instead of reducing the image cache size, is there some way to detect system resources are almost exhausted and flush out some images or prevent further allocations?
michael: Yes that would be a better idea, in addition, the 2Meg image cache is currently hardcoded and should be set through a pref instead. I'll reduced the hardcoded cache to 1Meg for M12. I'll file two bugs one for setting the cache size through preferences and one to detect the low resource condition.
Potentially Break-through Observations: I got a copy of the 120999 win build to test with and discovered something which may change the way this bug is being looked at. I saw in another bug, one of the Netscape team was using memory+ to track a possible memory leak. I decided to get the same tool and use it to look at this particular bug. The results REALLY surprised me. I used memory+ to watch the amount of physical memory and virtual memory that mozilla consumes. Yes, it does grow steadily with usage, but nothing like what would be causing the system lockups that mozilla has been causing me. I opened Win95's Resource Monitor and let it run on the taskbar, and ordered memory+ display so I could track it's User, System, and GDI resource levels while I simultaneously watched mozilla's memory consumption. Bam! It appears something will dramatically reduce the GDI resources as reported by the resource monitor WIHTOUT mozilla consuming much memory at all! Maybe this is not a memory leak per se? Anyway, in running through a bunch of sites to test, I may have found a breakthrough. On this latest build test1 - CSS Style sheets does not ever finish loading. And in fact, I sat there for about 10 minutes just switching between mozilla (trying to finish loading test1) and memory+ watching the memory consumption and I saw something very disturbing. Over time, as mozilla was trying to finish loading test1, the memory mozilla was occupying stayed relatively constant, but the GDI resource level fell from 72% to 33% slowly over the period of time I watched it. I finally killed mozilla at 33% to stop my system from locking up. I don't know whether the problem is in trying to finish loading that page, the animated gif on the page, or me periodically scrolling up and down in mozilla between program switchs, but something on that page is munching GDI resources like there's no tomorrow. I really hope this helps becuase I was sure this was a memory leak. Instead it appears memory is indeed consumed but very slowly. Something out there is consuming GDI resources (not memory?) like Cookie Monster eats cookies.
Added the URL
Note that Mac needs some low memory handling code, as described in bug http://bugzilla.mozilla.org/show_bug.cgi?id=20743
Just checked in the weak ref fix for my part of this.
I'm a dork. That URL was for memory+, not a test case.
More observations: As I stated earlier, I can reproduce the bulk of the GDI resource leaks by simply letting mozilla (build 1999120908) try and load debug | viewer demos | test1 - for as long as I have patience or GDI resources left. I had been wondering if this leak was a symptom of the scrolling I was doing, the CSS, the animated gif on that page, or something else. I made a test page that was longer than one page (so I could scroll up and down) and an animated gif. I patiently played and fiddled but no GDI resource leak. Then I remembered that there was a premade test included for animated gifs. I brought that page up (debug | viewer demos | test10) and watched my system resource for a while. While I did not find evidence of the GDI resource leaks, what I did find is proabably not a "Good Thing." I basically had three programs running, mozilla maximized, the system resource monitor - a small window, and memory+ maximized. What I found is that while I left the mozilla window showing the resource monitor would slowly tick down available "system" resources. When I would switch to the maximized memory+ (therefore covering up mozilla) the resources would be released and they would return to normal. Specifcally, when I started this "test" I had 65% available system resources. I counted a slow 120 seconds with mozilla maximized, I alt-tabbed to bring up the resource monitor (thereby covering a small portion of mozillas window) and read 8% system resources. It ticked down to 7% while I was watching it. Then I alt-tabbed to bring up memory+ (covering mozilla since it was maximized) and immediately, the "system" resources returned to 65%. Something weird is going on here methinks. Just to reiterate, this is NOT the bug I reported here - that is clearly linked to seaping GDI resources. But it may be related and I thought it important to share. Oh, I also mangled my copy of test1 (CSS Style test) to remove the background images from the style sheet delcaration for H4. I did this to see if the image was the problem. Nope, it still leaked GDI resources and was never able to "complete" loading that page. (Note the animation spun like it was still loading - but it was displaying all the page as described by the HTML). I guess the GDI leak resides somewhere in the CSS processing (there sure isn't much else in that test after removing the background stuff).
The Browser Buster Challenge: Eager to flesh out exactly what kind of things are causing mozilla to lock down my machine, I eagerly tried the browser buster located on the debug menu when I noticed it there a half hour ago. Here are the results (possibly surprising). I used Win95's system resource monitor to monitor my resources. When I started (no mozilla) I had 84% "system" resources and 99% "GDI" resources. After starting mozilla and loading mozilla.org, I had 82% system and 88% GDI. I started the browser buster at 23:52:18. As mozilla loaded amazon.com's page I recieved a W95 resoureces dangerously low warning. At 23:55:36, while trying to load intel's website (the page after amazon) mozilla ceased functioning (approx 3% GDI). I was able to kill mozilla with the close button. Upon closing down I noticed the command windown had many many WEBSHELL+ = lines in it, allthough it was too quick to read the numbers. I decided to rerun the test to watch the console window this time. When I restarted mozilla, I had 75% "system" resources and 75% GDI resources. I re-ran the browser buster test until the system completely locked up (significantly sooner than the first run of course). Anyway, at the time of the lock-up I noticed the last line in the console window was WEBSHELL+ = 23. I'm not sure what that number means (sorry for my ignorance) but if it's a reference count of sometype, something's not getting released. Anyway, I thought it would put the severity of the problem into perspective knowing that mozilla locked up my freshly rebooted system after just approximately 3 minutes of loading pages by the browser buster. (Oh, and I noticed the pages loaded were not in the order listed on the browser buster's top100 page. Amazon was certianly not the 82nd page loaded - maybe 5th or 6th.)
The WEBSHELL+ is just showing the total number of webshells that have been created. The BLOATY logs show it leaking webshells on exit, but usually its 1 or 2 webshells. We are leaking 1 DrawingSurfaceWin on exit which has potentially two handles to bitmaps, but this is not a signifant leakage. We never seem to leak more than one DrawingSurfaceWin on exit. I tried running the browser buster with a build from this morninging on WIN95. This has all of sarri's and mine changes. It ran through about 20 pages without failing and the GDI resources stayed at around 68%. The resources level dropped and increased slightly with each page load as expected. After 20 pages it suddenly dropped to 3% resources. I hit the back button and the resouces jumped back to 68%. I exited and the GDI resources went back to 79%. Which is exactly what they were before I ran mozilla. It was on goto.com when the resources dropped to 3%. I tried loading goto.com directly but the resources were fine. I tried running the browser buster again. This time it went for around 14 pages then the whole machine locked up. My guess is that all of the resources were suddenly consumed. It looks like there isn't a Cumulative GDI leakage problem. You can load page after page and GDI resources are used as expected and then suddenly the browser get's into a state or something a previous page triggers a catastrophic drop in the GDI resource level.
I obtained windows nightly build 1999121008 and did some small browse testing with it. I can notice that resource leaking seems to have been improved with the latest round of changes. However, as Kevin noted, the GDI resource leak is still there. Interesting to note I think is a slight change in the GDI resource leak with respect to the CSS Styles viewer demo (test1.html) In the previous builds every time I went to that page, mozilla would not finish loading it and my GDI resources would start flushing down the drain. This time I started the new build and went directly there. While the page did not "finish" loading, the GDI resources did not leak out. BUT, I did a little casual browsing (cnn, slashdot, blah.com - dont ask) until my GDI resources naturally dwindled down a little (maybe down 10% from their originals) Then frustrated that the test1.html did not work as it had, I went back to it. This time the flood gates opened and I watched the resource monitor show GDI resources plummeting 3% every 5 seconds or so. So while the symptoms are not exactly the same, I think a major clue to this GDI resource leaking is still lurking in that test1.html viewer demo. It is frustrating trying to determine what exactly is causing mozilla to enter into leak mode.
Kevin located the area of the leak! Yesterday in an attempt to locate the horrible GDI leaks that appear under W95, Kevin sent me a replacement navigator.xul which simply commented out the throbber and the animated status bar thing. Success! (Applied to build 1999121008) Since I have been running without those two components, my mozilla has been 20- 50X more functional than ever before! I see virtually NO resources leaks. I am able to browse for 90 minutes (as opposed to three) before a problem crashes the browser. My record of 9 pages in the browser buster has been broken by a complete rotation of all 100 sites! I have never realised how functional mozilla really is, once GDI stops leaking at you can use it! Now there does still seem to be a problem with WEBSHELL's though. Using browser buster for long perionds of time I noticed in the console window I had reached "WEBSHELL+ = 165". Somehow I don't think that is correct. But still, despite that, functionality is astounding. I would like to beg and plead that the problem with the throbber and/or status bar animation be found and eradicated prior to M12 release. I heard that M12 might be deemed an alpha release. What I was suffering with was definately less than alpha quality, but with the leaks nullified, mozilla seems almost beta quality! I think the majority of W95 users would be astounded at the imrpovement that eradicating this bug makes!
Those cc'd... read the last comment. Sounds like we have GDI resource problems with the throbber.
I emailed email@example.com two navigator.xul files. One which removes the progress meter, and another which removes the throbber. This will help narrow the problem down. CC'ing evaughan since he is knowledgable about the throbber and progress meter.
Preliminary testing shows the leak resides in the animated status bar. I will keep testing the throbber only verion but I have not been able to detect and GDI drain so far. On the other hand, the XUL with only the animated status bar encountered the GDI tailsping on www.abcnews.com mimicking exactly the "fail to finish loading - 3% per second GDI drop" sydrome normally associated with the CSS Styles demo (test1.html). Just a note, while this "fail to load" syndrome causes a tailspin plummet of GDI resources, GDI resources do slowly dwindle even on pages the "finish" loading.
This is all speculation...hope it does not cloud the issue... The problems seen in bug 19965 "Non-stopping throbber on specific content" may be relevant here. I could not reproduce that symptom, but noticed that Mozilla appeared to re-fetch a 15 second gif animation in a testcase provided by the reporter each time before re-displaying it (at least, something was fetched between the last frame and the first). Doubly speculative: In Win32 M11 the throbber and/or candypole may be being re-fetched continually from the OS's disk cache without the previous set of frames being re-used from the image cache (possibly iff the viewed pages clobber the contents of the image cache with their images) and resources are not being freed when this happens.
looks like I can run for about 50 pageloads on browser buster before my gdi sinks below %10 of available resources and my system hangs. this is better than previous milestones but still a ways to go.
Whiteboard: [PDT+] 12/3 → [PDT+] 12/16 removing progress meter from navigator.xul fixes majority of remaining leaks
Whiteboard: [PDT+] 12/16 removing progress meter from navigator.xul fixes majority of remaining leaks → [PDT] 12/16 removing progress meter from navigator.xul fixes majority of remaining leaks
Target Milestone: M12 → M13
with 12/14 builds I when over 120 page loads, and then went on an did other work before shutting down. enough of this is fixed that we can consider dogfood/pdt- moving the remaining work out to m13.
Whiteboard: [PDT] 12/16 removing progress meter from navigator.xul fixes majority of remaining leaks → [PDT-] 12/16 removing progress meter from navigator.xul fixes majority of remaining leaks
Putting on PDT- radar.
The only WIN32 handles that progress meter creates are timers. As a test I monitored the total number of timers allocated and deallocated by putting debugging code in every time a WIN32 timer is created with ::SetTimer or removed with ::KillTimer. Running on WINNT I don't see any excessive creation of timers. The number timers is always < 3. I was able to get a mozilla to fail (i.e GDI resources went to 0) in a copy with debugging info while running the browser buster. I got the following asserts. Attempt to blit with bad DC in nsRenderingContextWin at 2194. Then the following 4 asserts "recursive painting not permitted" nsViewManager.cpp. The url at time of a failure was www.davesclassics.com These asserts are probably just the result of the resources going to zero. I did find incorrect code in nsProgressMeterFrame::Notify. The StripTimer::Notify method in nsProgressMeterFrame.cpp makes call to update the progress meter by doing the following: vm->UpdateView(view, bounds, 0); The last parameter (0) is not a valid value to pass to UpdateView. Looking at the code in nsViewManager::UpdatView passing 0 would cause the view to be updated immediately. The call should probably be changed to: vm->UpdateView(view, bounds, NS_VMREFRESH_NO_SYNC); Views are normally updated with the NS_VMREFRESH_NO_SYNC which means they will not be painted until a NS_PAINT message comes through the message queue. If we need to have the progress meter paint synchronously then we should pass NS_VMREFRESH_IMMEDIATE. I'm reassigning this bug to evaughan, since he is planning on re-implementing the progress meter to use images.
I laso used to face this problem on Win-95. For our automation test, after every 40 to 50 testcases system used to hang because of memory leaks. So after every 40 to 50 testcases I had to restart the automation from next number of testcases. It was really painfull to run thousands of testcases. But fortunately I'm not experiencing this [much] since last two three builds. Specifically todays [12-16-09] build is excellent and it did not stop automation after every 40-50 testcases. But if testcase is about applet then it will still do same thing and would go out of memory. [I think that could be totally seperate issue.] Testcases with applets still have memory leak problems on win-95. These memory leaks are only on win-95.
I concur. Using build 1999121612 I found the fatal GDI resource leaks to be gone. I was able to do a good bit of by-hand browsing, some 50 pages of browser buster browsing, and even the usually fatal CSS Styles Demo (test1.html) all without incident. It would appear from my perspective that the problem has been fixed - or shrunk to the point that it is not noticeable.
> Then the following 4 asserts "recursive painting not permitted" > nsViewManager.cpp. Some speculation: maybe the cause of the recursive painting asserts (and the resource leaks) was because the timer for repainting the progress meter was firing before the repainting for the previous timer notification had a chance to complete (ie. there was a re-entrency problem). The parameter that made UpdateView repaint synchronously was probably a contributing factor. There was a checkin by somebody that reduced the repaint timer frequency, which may have unknowingly fixed this problem, by reducing the timer notification frequency. Changing UpdateView to paint asynchronously, or delaying restarting the timer until painting has completed for the current cycle may also be necessary for slow machines. If this speculation is true, then the speed of the machine would be a significant factor. firstname.lastname@example.org: what processor/clock speed is your machine?
email@example.com, did we get a bug filed on the possible gdi leak with applets, and assigned to drapeau? -thanks
For michael.lowe's scenario of a timer event nesting in a paint, we'd have to be nesting an event loop, it seems to me. Don't think that's possible from paint code -- dougt and danm might know more. /be
The only machine I have W95 loaded on to do testing is running on a C6-200 chip. Timing could explain why the individuals that do experience this problem experience such startling different magnitudes of the problem. Strictly speculation mind you.
Progress meter is going to be rewritten to use a animated gif. So all the timer and paint stuff will go away.
won't happen for m13
Summary: [DOGFOOD]Major memory leaking in M11 under W95 OSR2 → Major memory leaking in M11 under W95 OSR2
Whiteboard: [PDT-] 12/16 removing progress meter from navigator.xul fixes majority of remaining leaks → 12/16 removing progress meter from navigator.xul fixes majority of remaining leaks
moving from leftover dogfood to beta1 radar
Whiteboard: 12/16 removing progress meter from navigator.xul fixes majority of remaining leaks → fixed awaiting checking
Status: ASSIGNED → RESOLVED
Last Resolved: 19 years ago
Resolution: --- → FIXED
Whiteboard: fixed awaiting checking
Fixed progress meter is now an animated gif and does not use timers which were causing the problem.
*** Bug 22005 has been marked as a duplicate of this bug. ***
You need to log in before you can comment on or make changes to this bug.