Closed Bug 20743 Opened 25 years ago Closed 24 years ago

Need low-memory handling code to avoid crash

Categories

(SeaMonkey :: General, defect, P2)

PowerPC
Mac System 8.5
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: sfraser_bugs, Assigned: sfraser_bugs)

References

Details

(4 keywords, Whiteboard: [rtm++][p:2][dogfood-][Mac issue][PDTP2])

Attachments

(1 file)

We need to have some code that deals with low-memory situations. This code should detect low-memory situations, and respond by flushing various caches (imglib etc). Cc:ing pierre to get input on the embedding situation, since this is something that an embedder might already have code for, but needs to call into Gecko to tell us to flush caches.
It should work both ways. If we detect a low memory condition, we could ask the application to free some caches too. It makes me think that we should write some sample code to use Gecko as a separate component instead of linking it as static libraries into the app's project so that the memory allocators don't interfere.
Status: NEW → ASSIGNED
Target Milestone: M13
M13. Not sure where this code should live.
The embedding code is now in lib/mac/embedding.
Blocks: 19187
Summary: Need low-memory handling code → [DOGFOOD]Need low-memory handling code
Target Milestone: M13 → M12
another PDT+ is dependent on this one, either this needs to be PDT+, or we need to downgrade 19187
Whiteboard: [PDT-]
Putting this on the PDT- radar. We agreed to not fix, but workaround 19187 by increasing partition size.
Target Milestone: M12 → M13
moving to m13
Target Milestone: M13 → M14
I'm on vacation until Jan 10; moving this to M14
Target Milestone: M14 → M16
moving to m16 - load balance
Putting dogfood in the keyword field.
Keywords: dogfood
Summary: [DOGFOOD]Need low-memory handling code → Need low-memory handling code
Blocks: 4201
m17
Target Milestone: M16 → M17
Make jrgm QA contact, since he sees crashes resulting from low memory sitations frequently. Brendan: this bug calls for some memory pressure API. Nominating for nsbeta2, since jrgm sees low memory crashes frequently.
Keywords: nsbeta2
QA Contact: leger → jrgm
Summary: Need low-memory handling code → Need low-memory handling code to avoid crash
Whiteboard: [PDT-]
Putting on [nsbeta2-][dogfood-] radar. Puttig on nsbeta3 keyword for work on that release.
Keywords: nsbeta3
Whiteboard: [nsbeta2-][dogfood-]
Keywords: arch
I'd like to add something to nsIAllocator to allow memory flushers to be registered. Is this what you had in mind Simon?
Yes. And I need to do some mac code to detect and respond to low-memory situations.
Simon: I reworked nsIAllocator to be nsIMemory, which now includes the additional methods: void registerObserver(in nsIMemoryPressureObserver obs); void unregisterObserver(in nsIMemoryPressureObserver obs); where: interface nsIMemoryPressureObserver : nsISupports { const unsigned long REASON_ALLOC_FAILURE = 0; const unsigned long REASON_HEAP_MINIMIZE = 1; /** * Called in response to a low-memory condition. * @param reason - either REASON_ALLOC_FAILURE when alloc or realloc * fails, or REASON_HEAP_MINIMIZE when heapMinimize is * called * @param requestedAmount - either the size requested by alloc or realloc, * or 0 if heapMinimize was called */ void flushMemory(in unsigned long reason, in size_t requestedAmount); }; so you be able to hook mac-specific flushers into this.
Adding crash to keyword field.
Keywords: crash
*** Bug 41607 has been marked as a duplicate of this bug. ***
Keywords: relnote
Removing whiteboard entries for reconsideration. I don't understand how a dogfood+ bug could be closed as a dup of this, yet this isn't even nsbeta2+. Bumping up the partition does not fix this, at most it delays the problem while raising the bar on target machines. Isn't it risky to wait until b3 to add code to handle such fundamentals? Also, what would the relnote say - "Will crash frequently at random"?
Whiteboard: [nsbeta2-][dogfood-]
I agree with trudelle. And to clarify the effects of increasing the partition size: giving the app more memory will not always avoid out of memory crashes. This is because there are complex interactions between our memory allocators (which grab a chunk of the heap on startup), and memory allocation used for GWorlds and other toolbox structures (used for images and double buffering). We really need some new code that does the right thing under low memory.
Simon: waiting for a better solution, we still need to increase the memory partition. These days, the daily optimized builds don't even allow me to have a browser window and check my mail. They have a 12Mb partition, we could increase it to 16Mb.
[dogfood-][nsbeta2-] would like to have for beta3
Whiteboard: [dogfood-][nsbeta2-]
Fine. I opened bug 41760 to have a solution for beta2.
Pierre: the app uses temp memory when it needs to, for malloc/new as well as the backing store GWorlds, so theoretically there should be no need to increase the partition size. However, there is code that fails to be smart about GWorld allocations in temp mem, so a partition size increase may be in order. We should set the partion size so that with a single browser window, we don't have to alloate any temp mem blocks. I think, now, we allocate 4 1Mb blocks, so this might call for increasing the partition size by about 4Mb.
Blocks: 43060
I set my new Mac to millions of colors and crashed today opening a new window (one time I saw an assert for failing to allocate a gworld right before I crashed). My work-around is to set my monitor to thousands of colors. Also, I saw my memory partition nearly full (only about 400K spare).
Blocks: 3412
Keywords: footprint
removing status whiteboard text and setting to m18
Keywords: nsbeta2
Whiteboard: [dogfood-][nsbeta2-]
Target Milestone: M17 → M18
Putting on [dogfood-] radar. Not critical to everyday use. Putting on [NEED INFO] radar. PDT needs to know impact to user and risk of fix to make a call on this bug. How often does this happen? Has anything changed since the last two times it's been given a -? Is this worth pulling pr2 off the wire.
Whiteboard: [dogfood-][NEED INFO]
Sooner or later, the user is going to crash hard if this bug is not fixed. When/ if they crash will depend on their partition size, how many pages they have loaded, their screen depth, the number of windows open, and how big those windows are. And simply increasing the partion size will not make the crashes go away, necessarily. What more info do you need?
Blocks: 46000
Whiteboard: [dogfood-][NEED INFO]
Putting on [nsbeta2+][dogfood-] radar. Does not need a fix ASAP for daily work, but we should fix this for beta2.
Whiteboard: [nsbeta2+][dogfood-]
You really meant nsbeta2+? Not nsbeta3+?
Fix checked in. I now implement the traditional Mac application low memory handling, with a 32k memory reserve handle, and a GrowZoneProc.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Well, I've been beating up my Mac, and I cannot trigger the sort of memory problems that I had in the past (where my windows would, er, evaporate at certain times [evaporate being a highly technical term]). I played with my shrinking my preferred size, loading large pages while resizing, resizing with cmd held down, multiple windows, etc. ... Marking verified fixed.
Status: RESOLVED → VERIFIED
Hrmm. I've seen a commercial build go belly-up with a dsMemFullErr a few times now. I think there is still work to do here.
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
Removing nsbeta2+; the additional work is not landing for PR2.
Whiteboard: [nsbeta2+][dogfood-] → [dogfood-]
setting to nsbeta3+, setting priority to p:2 -- Simon, need your input
Priority: P3 → P2
Whiteboard: [dogfood-] → [nsbeta3+][p:2][dogfood-]
update whiteboard
Hardware: All → Macintosh
Whiteboard: [nsbeta3+][p:2][dogfood-] → [nsbeta3+][p:2][dogfood-][Mac issue]
PDT agrees P2 if it still is reproducible
Whiteboard: [nsbeta3+][p:2][dogfood-][Mac issue] → [nsbeta3+][p:2][dogfood-][Mac issue][PDTP2]
Depends on: 46337
Mac fixes are checked in. This is still waiting on 52402.
Depends on: 52402
Marking this fixed now that imagelib flushing works. To test: open lots of browser windows, mail, composer, and load some heavy web pages (http://www.voodooextreme.com, cnn.com, news.bbc.co.uk etc). However much you load, it should not crash.
Status: REOPENED → RESOLVED
Closed: 24 years ago24 years ago
Resolution: --- → FIXED
Hrmm, things are still bad here. Reopening.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
*** Bug 53633 has been marked as a duplicate of this bug. ***
Fixes checked in; there was a bug where I was not making the reallocated memory buffer handle purgeable, and I tweaked the numbers so that GFX leaves a bit more free heap space
Status: REOPENED → RESOLVED
Closed: 24 years ago24 years ago
Resolution: --- → FIXED
> However much you load, it should not crash. Okay, reopening on that basis, but Simon can use his own judgement on how to deal with this. I must say, though, that the performance is really quite good and the first time I managed to crash the app, I had ~25 browser windows open to home.netscape.com, a 3700 message IMAP folder open, AddressBook, and 9 Composer windows open (that's a lot!!, and more than an average user would use). However, after the crash, I devised a random page disher and set it as my home page (http://jrgm.mcom.com/cgi-bin/random-page.pl) where the pages are the ones above plus slashdot, home.netscape.com, an lxr page, and tinderbox. With that, I usually hit a crash at about page 9 or 10, with this kind of info from 'how', 'hc all', 'ht', and 'sc'. Bus Error at FFC66426 _InitProcMenu+00246 while reading word from 1A23ABFF in User data space Checking all heaps The System heap at 00002800 is ok The ROM read-only heap at 00028F20 is ok The Process Manager heap at 01153AB0 is ok The ÒNetscape 6Ó heap at 0E176040 is bad This block's back pointer doesn't point to the previous block. Block header 0EE3C9D0 0E01 2020 AB00 0004 0000 0090 00A5 79FC ¥¥ «¥¥¥¥¥¥�¥¥yü The ÒShareWay IP Personal BgndÓ heap at 0F26A670 is ok The ÒFile Sharing ExtensionÓ heap at 0F339280 is ok The heap at 0F376DB0 is ok The ÒFinderÓ heap at 0F3936D0 is ok The ÒTime SynchronizerÓ heap at 0F496B30 is ok The ÒOpen Transport SNMPÓ heap at 0F4BA390 is ok The ÒNorton SchedulerÓ heap at 0F52B3F0 is ok The ÒFolder ActionsÓ heap at 0F5BC450 is ok The ÒFBC Indexing SchedulerÓ heap at 0F62D4B0 is ok The ÒControl Strip ExtensionÓ heap at 0F667970 is ok System heap high free space + TempMem low free space = #197049584 (#187M) Totaling the ÒNetscape 6Ó heap at 0E176040 The ÒNetscape 6Ó heap at 0E176040 is bad This block's back pointer doesn't point to the previous block. Block header 0EE3C9D0 0E01 2020 AB00 0004 0000 0090 00A5 79FC ¥¥ «¥¥¥¥¥¥�¥¥yü Calling chain using A6/R1 links Back chain ISA Caller (for comparison, I can do the same test on win32 and linux for >20 pages, by which time the Mem size of netscape is > 96MB, and I have put the OS into paging thrash.). [Sidenote: I think the Mac new window performance is better than Windows and Linux when showing the new chrome window! Way to go.].
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
What killed you there is heap corruption, not simple out of memory problems. I filed a new bug, bug 54023, on the heap corruption problem, and am going to close this bug again.
Status: REOPENED → RESOLVED
Closed: 24 years ago24 years ago
Resolution: --- → FIXED
Brade and other still see low memory crashers. I see that it's possible to get into this state when nothing can be flushed from imglib caches. More tuning required here.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
setting to nsbeta3-, and rtm+ it's a crash, which can and should be resolved for rtm
Keywords: rtm
Whiteboard: [nsbeta3+][p:2][dogfood-][Mac issue][PDTP2] → [nsbeta3-][rtm+][p:2][dogfood-][Mac issue][PDTP2]
Target Milestone: M18 → M19
Simon, please include the required information per the rtm checkin rules
Whiteboard: [nsbeta3-][rtm+][p:2][dogfood-][Mac issue][PDTP2] → [nsbeta3-][rtm+ NEED INFO][p:2][dogfood-][Mac issue][PDTP2]
What's killing us here are the additional handles created for each NewGWorld call. Each one creates 29 handles. When allocating a 1-bit GWorld (e.g. for a GIF mask), one of those handles is 4.6K in size. These go into the heap even if the image bits go into temp mem. So, possible solutions to fixing this: 1. Leave lots of heap space free, and hope that someone does not create enough images that they run out. 2. Use Heap zones to ensure that GWorld bits are controllable, and don't fill the heap. 3. Don't use GWorlds at all, reverting to the old hand-rolled, new[] buffer allocations for image bits. 4. Don't use GWorlds, but create CGrafPorts, PixMapHandles etc by hand. 5. Don't use GWorlds for 1-bit masks, using hand-rolled BitMaps instead (there is a technote on how to do this). Others?
removing + per pdt sw rules
Whiteboard: [nsbeta3-][rtm+ NEED INFO][p:2][dogfood-][Mac issue][PDTP2] → [nsbeta3-][rtm NEED INFO][p:2][dogfood-][Mac issue][PDTP2]
Are we crashing inside NewGWorld()? If not, there is no reason to remove it. If we do, let's make sure we have enough free mem before calling it.
There is plenty of reason to not use GWorlds for images, and hand-roll PixMaps instead (as dcone's old code used to do) -- it would save 2-4K of heap data per image. Where the crash happens can vary; it can be any Toolbox call that allocates memory in the heap. In debug builds it's often in TextEdit while outputting to the console, but it could be in NewRgn or NewGWorld. Right now, if we detect insufficient memory in the heap, we call NewGWorld with the useTempMem flag. But this still puts 29 small handles in the heap -- only the pixels go into temp mem. Maybe a cheap solution to this bug would be to simply return an out of memory error in this case, and not attempt to allocate the GWorld at all.
I attach some diffs that I think are the safest way to fix this now. mozilla/gfx/src/mac/nsDrawingSurfaceMac.cpp: Change some constants so that we go to temp mem for drawing surface allocations (of which there are few, and which are normally big) earlier, leaving more heap mem free. Go to temp mem if < 1024k (max) or 512k (contiguous) is free. mozilla/gfx/src/mac/nsImageMac.cpp Similarly to the above patch, I changed some constants so that we go to temp mem earlier (1024k max, 768k contig). However, I also changed the logic so that we'll acutally refuse to allocate at all if memory is below a lowet threshold (512k max, 384k contig). This ensure that we never totally fill the heap with the heap- based handles that GWorlds create (even when allocating the pixels of the GWorld in temp mem). It is safe to allocate GWorlds; these failure paths have been tested, and imglib recovers safely. If the images that fail to allocate are chrome images, you may see missing images in the UI. However, this is preferable to crashing. Longer term, I think we need to go back to hand-rolling PixMaps for image data.
s/lowet/lower s/is safe to allocate/is safe to fail to allocate
r=pierre
sr=dcone
change rtm NEED INFO to rtm+
Keywords: nsbeta3
Whiteboard: [nsbeta3-][rtm NEED INFO][p:2][dogfood-][Mac issue][PDTP2] → [rtm+][p:2][dogfood-][Mac issue][PDTP2]
PDT says rtm++, please land on branch ASAP.
Whiteboard: [rtm+][p:2][dogfood-][Mac issue][PDTP2] → [rtm++][p:2][dogfood-][Mac issue][PDTP2]
Probably too late in the game, but Mac Mozilla Installer build 2000101808 blew up in the manner John Morrison mentioned. When I crashed, I had perhaps 10 navigator windows open, one of them to http://www.apple.com/ which AFAIK has a QuickTime movie on it. I crashed so hard, MacsBug wouldn't even give me a stdlog -- all I got was this: MacsBug 6.6.3, Copyright Apple Computer, Inc. 1981-2000 Bus Error at FFC6640A _DeleteMenu+0008A while reading long word from 4F430000 in User data space 18-Oct-2000 9:48:44 PM (since boot = 18 minutes) Current application is ?Mozilla? Machine = #406 (NewWorldMac), System $0904, sysu = $01008000 Unable to access that address PC: FFC66424 Frame Type: B008 Log file already open Dunno if this helps, but thought I might as well post anyway.
Fix checked into trunk and branch. I spun off bug 57327 for going back to using PixMaps here.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago24 years ago
Resolution: --- → FIXED
I can't find any setting of the Mac application heap that will let my debug build of Mozilla load its chrome. I am very tempted to back out this change to nsImageMac.cpp. If I revert to v1.30, things work.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
beard: Could you give more details? I have a fresh debug build from tonight that runs perfectly well with the different chromes and within the default 16597 Kb partition (VM off).
beard: use ZoneRanger to look at your heap. Why is yours different?
beard: are you running the Boehm GC, or MJR stuff, or something else that eats heap space?
Beard was running with the GC stuff, which never called MaxApplZone(). He's gonna fix that.
Status: REOPENED → RESOLVED
Closed: 24 years ago24 years ago
Resolution: --- → FIXED
Say, Simon, et al. Given that this bug has been reopened 5 times, and with that other memory bug (low pref size) floating around, I don't want to confuse matters by doing another reopen. Performance in this regard is mightily improved. But by the arbitrary test of loading from a pallette of heavy web pages (dished from my startup URL), while also having mail, composer, etc. open -- at the point where I went to get my 45th browser window open, I hit an unmapped memory exception with the stack in a call to a factory (in CSS Style somewhere). Should I reopen this bug or should I file a new bug for sometime after RTM (or would you prefer to tell me "Don't do that ...").
I certainly did not test as thoroughly as you did but I don't crash anymore. However, and this is a memory-related problem too, I hit constantly bug 57835 / bug 58490 where the chrome falls apart after opening a few windows of different types. If we still have a little bit of time to spent on low-mem problems, I think it would be better to focus on the problems with the chrome (apparently fixed today, in the builds tomorrow) than crashes that require dozens of browser windows.
Oh, absolutely -- if we reopened this bug, it would be for post-RTM work. There's no "flip one switch" at hand, I don't think, and the "missing chrome" is a far more tangible bug, than some idiot (me) opening 45 browser windows :-] At any rate, this is fine for the branch, so I'm marking it vtrunk, and we can pick it up later, or open a brand-new shiny bug for later.
Keywords: vtrunk
jrgp: please open a separate bug with your CSS-related crash, with MacsBug log is availabule. Thanks.
Product: Browser → Seamonkey
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: