Closed Bug 445385 Opened 16 years ago Closed 12 years ago

random crashes like [@ RaiseException], gdi resource starvation

Categories

(Core :: Graphics, defect)

1.9.0 Branch
x86
Windows XP
defect
Not set
critical

Tracking

()

RESOLVED INCOMPLETE

People

(Reporter: bugzilla_mozilla_org, Unassigned)

References

Details

(Keywords: crash, Whiteboard: [needs stack (probably from WinDbg) with artificially limited GDI])

Crash Data

Attachments

(2 files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9) Gecko/2008052906 Firefox/3.0
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9) Gecko/2008052906 Firefox/3.0

firefox chews gdi resources until everything is unstable, ff usually crashes but it causes problems with the OS even if the "desktop heap" size is increased to compensate (like: Application popup: dwwin.exe - Application Error : The application failed to initialize properly (0xc0000142). Click on OK to terminate the application. which also OFTEN keeps the crash reporter dialog from starting properly)

GDIIndicator (http://msdn.microsoft.com/en-us/magazine/cc188782.aspx) is reporting over 8079 GDI objects for one instance of FF alone, of which 3011 are brushes (i poked around the code, ever nsWindow has a brush created, no cache or anything for common colors, or using stock brushes for say, white and black)

the session cap, by default for an app is 10000 objects, and allocations are already failing for some types of gdi objects in the process, it eventually degenerates into waiting for gdi objects to be created and being unresponsive, as old objects become invalid

granted I have probably too many tabs in my session, but ff2 at worst got extremely slow, and was a clear indicator that I had to many open. with ff3 i can barely keep a session alive long enough to close more than a few tabs

finally, i saw some leak notices from cairo, and poked around there a bit, i think most leaked and not recovered objects are occuring in cairo

Reproducible: Always

Steps to Reproduce:
1. shrink desktop heap size artificially (http://www.ss64.com/orasyntax/desktopheap.html)
2. open enough tabs to consume said heap
3. ???
4. profit
Actual Results:  
(i did not need to do step 1, as my session already hits my increased desktop limits)

firefox and the entire OS becomes mildly unresponsive, stuff like opening new apps fail and popup menus don't pop up, desktop doesn't refresh... really every aspect of starved gdi/user resources

Expected Results:  
browser would self limit object usage, or at the very least not leak them and fail

when the crash reporter does work, or I look at the stack in windbg, its often an unhandled exception from new, or the cycle collector is on the stack

some crash reports (2 of about 200 crashes, crash reporter usually fails to start, as above)

http://crash-stats.mozilla.com/report/index/cb43e94a-5295-11dd-ac00-001a4bd43e5c
http://crash-stats.mozilla.com/report/index/2bcd3ed9-5200-11dd-b70d-0013211cbf8a

(these are from the 3.0.1 version on one machine, but i have 2 other sessions that act exactly the same in every manner on 3.0 release)
Version: unspecified → 3.0 Branch
Component: General → GFX: Thebes
Keywords: footprint, mlk, topcrash
Product: Firefox → Core
QA Contact: general → thebes
Version: 3.0 Branch → 1.9.0 Branch
since you're controlling things, in case you aren't aware of them:
http://developer.mozilla.org/en/docs/How_to_get_a_stacktrace_with_WinDbg
http://developer.mozilla.org/en/docs/Using_the_Mozilla_source_server

attach windbg before you torture firefox and then you can get a stack trace :).
actually, for this purpose, i'd recommend ntsd or cdd, but i'm sure you're aware of those things....

anyway. i'm confused, shouldn't gdi limits not be related to oom limits? or did you basically cause there to be less space for the app by growing the size of the gdi heap?

anyway, if you could restrict the gdi heap for gecko and cause it to crash when it fails to handle gdi failure, you could file bugs about that :).

the gdi code should mostly be near these files:
http://mxr.mozilla.org/mozilla-central/search?string=brush&find=cairo&hitlimit=1

/gfx/cairo/cairo/src/cairo-win32-surface.c
/gfx/cairo/cairo/src/cairo-win32-font.c
/gfx/cairo/cairo/src/cairo-win32-private.h
/gfx/cairo/cairo/src/cairo-win32-printing-surface.c

my guess is that we have too many surface->bitmap's alive, although it's possible that the problem is surface->dc, I'm assuming that we have a fairly limited number of scaled_font->scaled_hfont's and scaled_font->unscaled_hfont's floating around.
i had a pile of traces before i realized that it was basically random, and apparently the dumps i saved trashed them. i'll collect some more asap
they all show the same signature, i don't think they're related to this bug
Whiteboard: [needs stack (probably from WinDbg) with artificially limited GDI]
Josh, can you still reproduce your issue using FF 3.6 or trunk?

Some gdi issues might have been resolved as of [fixed1.9.1b4] in...
  Bug 485351 -  Hang [@ gfxWindowsFont::ComputeMetrics] and eating up all GDI resources with percentage height, mathml and binding

I didn't look hard enough before filing my  Bug 551667 -  window paint problems with high gdi object count (approaching 10,000) on vista

Are these core graveyard bugs totally obsolete, i.e. is the code behind them no longer used?
 Bug 218861 -  nsImageWin: come up with a better way to limit GDI object usage 
 Bug 236415 -  All WIN32 GDI calls should check return code
    (has a draft patch)

There is also
 Bug 357765 -  Use BeginBufferedPaint for double buffering on Vista
query: http://tinyurl.com/ykfvhxv
comment 4 is just crashing due to oom in gc handling. it's safe and unfortunate. i recently landed a backport of our oom handling for that, i don't really know what will happen when that fails.
hah, I had completely forgot about this bug; I had stupidly disabled the swap on that machine, and it hadn't done it since I added some that I can recall, it's still probably something ff should scale back on ... somehow, but enabling the swap probably just pushed the limit further out; to wit, I haven't had it happened since
so do we close this invalid, or dup it to some "do oom better" bug? (which I'm not much in touch with)

(In reply to comment #8)
> comment 4 is just crashing due to oom in gc handling. it's safe and
> unfortunate. i recently landed a backport of our oom handling for that

timeless, for which 1.9.x?
Summary: random crashes, gdi resource starvation → random crashes like [@ RaiseException], gdi resource starvation
probably 1.9.2
Crash Signature: [@ RaiseException]
reporter no longer has test environment.
however, in XP environment it's not necessarily hard to drive up GDI usage
Status: UNCONFIRMED → RESOLVED
Closed: 12 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: