Closed Bug 20743 Opened 25 years ago Closed 24 years ago

Need low-memory handling code to avoid crash

Categories

(SeaMonkey :: General, defect, P2)

PowerPC
Mac System 8.5
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: sfraser_bugs, Assigned: sfraser_bugs)

References

Details

(4 keywords, Whiteboard: [rtm++][p:2][dogfood-][Mac issue][PDTP2])

Attachments

(1 file)

We need to have some code that deals with low-memory situations. This code should
detect low-memory situations, and respond by flushing various caches (imglib
etc).

Cc:ing pierre to get input on the embedding situation, since this is something
that an embedder might already have code for, but needs to call into Gecko to
tell us to flush caches.
It should work both ways. If we detect a low memory condition, we could ask the
application to free some caches too.

It makes me think that we should write some sample code to use Gecko as a
separate component instead of linking it as static libraries into the app's
project so that the memory allocators don't interfere.
Status: NEW → ASSIGNED
Target Milestone: M13
M13. Not sure where this code should live.
The embedding code is now in lib/mac/embedding.
Blocks: 19187
Summary: Need low-memory handling code → [DOGFOOD]Need low-memory handling code
Target Milestone: M13 → M12
another PDT+ is dependent on this one, either this needs to be PDT+, or we need
to downgrade 19187
Whiteboard: [PDT-]
Putting this on the PDT- radar.  We agreed to not fix, but workaround 19187 by
increasing partition size.
Target Milestone: M12 → M13
moving to m13
Target Milestone: M13 → M14
I'm on vacation until Jan 10; moving this to M14
Target Milestone: M14 → M16
moving to m16 - load balance
Putting dogfood in the keyword field.
Keywords: dogfood
Summary: [DOGFOOD]Need low-memory handling code → Need low-memory handling code
Blocks: 4201
m17
Target Milestone: M16 → M17
Make jrgm QA contact, since he sees crashes resulting from low memory sitations 
frequently.

Brendan: this bug calls for some memory pressure API.
Nominating for nsbeta2, since jrgm sees low memory crashes frequently.
Keywords: nsbeta2
QA Contact: leger → jrgm
Summary: Need low-memory handling code → Need low-memory handling code to avoid crash
Whiteboard: [PDT-]
Putting on [nsbeta2-][dogfood-] radar.  Puttig on nsbeta3 keyword for work on 
that release.
Keywords: nsbeta3
Whiteboard: [nsbeta2-][dogfood-]
Keywords: arch
I'd like to add something to nsIAllocator to allow memory flushers to be 
registered. Is this what you had in mind Simon?
Yes. And I need to do some mac code to detect and respond to low-memory 
situations.
Simon: I reworked nsIAllocator to be nsIMemory, which now includes the 
additional methods:

    void registerObserver(in nsIMemoryPressureObserver obs);
    void unregisterObserver(in nsIMemoryPressureObserver obs);

where:

interface nsIMemoryPressureObserver : nsISupports
{
    const unsigned long REASON_ALLOC_FAILURE = 0;
    const unsigned long REASON_HEAP_MINIMIZE = 1;

    /**
     * Called in response to a low-memory condition. 
     * @param reason - either REASON_ALLOC_FAILURE when alloc or realloc 
     *                 fails, or REASON_HEAP_MINIMIZE when heapMinimize is
     *                 called
     * @param requestedAmount - either the size requested by alloc or realloc,
     *                 or 0 if heapMinimize was called
     */
    void flushMemory(in unsigned long reason,
                     in size_t requestedAmount);
};

so you be able to hook mac-specific flushers into this. 
Adding crash to keyword field.
Keywords: crash
*** Bug 41607 has been marked as a duplicate of this bug. ***
Keywords: relnote
Removing whiteboard entries for reconsideration.  I don't understand how a 
dogfood+ bug could be closed as a dup of this, yet this isn't even nsbeta2+.  
Bumping up the partition does not fix this, at most it delays the problem while 
raising the bar on target machines. Isn't it risky to wait until b3 to add code 
to handle such fundamentals?  Also, what would the relnote say - "Will crash 
frequently at random"?
Whiteboard: [nsbeta2-][dogfood-]
I agree with trudelle. And to clarify the effects of increasing the partition 
size: giving the app more memory will not always avoid out of memory crashes. 
This is because there are complex interactions between our memory allocators 
(which grab a chunk of the heap on startup), and memory allocation used for 
GWorlds and other toolbox structures (used for images and double buffering). We 
really need some new code that does the right thing under low memory.
Simon: waiting for a better solution, we still need to increase the memory 
partition. These days, the daily optimized builds don't even allow me to have a 
browser window and check my mail. They have a 12Mb partition, we could increase 
it to 16Mb.
[dogfood-][nsbeta2-] would like to have for beta3
Whiteboard: [dogfood-][nsbeta2-]
Fine. I opened bug 41760 to have a solution for beta2.
Pierre: the app uses temp memory when it needs to, for malloc/new as well as the 
backing store GWorlds, so theoretically there should be no need to increase the 
partition size. However, there is code that fails to be smart about GWorld 
allocations in temp mem, so a partition size increase may be in order.

We should set the partion size so that with a single browser window, we don't 
have to alloate any temp mem blocks. I think, now, we allocate 4 1Mb blocks, so 
this might call for increasing the partition size by about 4Mb.
Blocks: 43060
I set my new Mac to millions of colors and crashed today opening a new window
(one time I saw an assert for failing to allocate a gworld right before I
crashed).  My work-around is to set my monitor to thousands of colors.

Also, I saw my memory partition nearly full (only about 400K spare).
Blocks: 3412
Keywords: footprint
removing status whiteboard text and setting to m18
Keywords: nsbeta2
Whiteboard: [dogfood-][nsbeta2-]
Target Milestone: M17 → M18
Putting on [dogfood-] radar.  Not critical to everyday use. Putting on [NEED 
INFO] radar. PDT needs to know impact to user and risk of fix to make a call on 
this bug. How often does this happen?  Has anything changed since the last two 
times it's been given a -?  Is this worth pulling pr2 off the wire.
Whiteboard: [dogfood-][NEED INFO]
Sooner or later, the user is going to crash hard if this bug is not fixed. When/
if they crash will depend on their partition size, how many pages they have 
loaded, their screen depth, the number of windows open, and how big those windows 
are. And simply increasing the partion size will not make the crashes go away, 
necessarily.

What more info do you need?
Blocks: 46000
Whiteboard: [dogfood-][NEED INFO]
Putting on [nsbeta2+][dogfood-] radar.  Does not need a fix ASAP for daily work, 
but we should fix this for beta2. 
Whiteboard: [nsbeta2+][dogfood-]
You really meant nsbeta2+? Not nsbeta3+?
Fix checked in. I now implement the traditional Mac application low memory 
handling, with a 32k memory reserve handle, and a GrowZoneProc.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Well, I've been beating up my Mac, and I cannot trigger the sort of memory
problems that I had in the past (where my windows would, er, evaporate at 
certain times [evaporate being a highly technical term]). I played with
my shrinking my preferred size, loading large pages while resizing, resizing
with cmd held down, multiple windows, etc. ... Marking verified fixed.
Status: RESOLVED → VERIFIED
Hrmm. I've seen a commercial build go belly-up with a dsMemFullErr a few times 
now. I think there is still work to do here.
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
Removing nsbeta2+; the additional work is not landing for PR2.
Whiteboard: [nsbeta2+][dogfood-] → [dogfood-]
setting to nsbeta3+, setting priority to p:2 -- Simon, need your input
Priority: P3 → P2
Whiteboard: [dogfood-] → [nsbeta3+][p:2][dogfood-]
update whiteboard
Hardware: All → Macintosh
Whiteboard: [nsbeta3+][p:2][dogfood-] → [nsbeta3+][p:2][dogfood-][Mac issue]
PDT agrees P2 if it still is reproducible
Whiteboard: [nsbeta3+][p:2][dogfood-][Mac issue] → [nsbeta3+][p:2][dogfood-][Mac issue][PDTP2]
Depends on: 46337
Mac fixes are checked in. This is still waiting on 52402.
Depends on: 52402
Marking this fixed now that imagelib flushing works.

To test: open lots of browser windows, mail, composer, and load some heavy web 
pages (http://www.voodooextreme.com, cnn.com, news.bbc.co.uk etc). However much 
you load, it should not crash.
Status: REOPENED → RESOLVED
Closed: 24 years ago24 years ago
Resolution: --- → FIXED
Hrmm, things are still bad here. Reopening.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
*** Bug 53633 has been marked as a duplicate of this bug. ***
Fixes checked in; there was a bug where I was not making the reallocated memory 
buffer handle purgeable, and I tweaked the numbers so that GFX leaves a bit more 
free heap space
Status: REOPENED → RESOLVED
Closed: 24 years ago24 years ago
Resolution: --- → FIXED
> However much you load, it should not crash.

Okay, reopening on that basis, but Simon can use his own judgement on how to 
deal with this. I must say, though, that the performance is really quite good
and the first time I managed to crash the app, I had ~25 browser windows open 
to home.netscape.com, a 3700 message IMAP folder open, AddressBook, and 9 
Composer windows open (that's a lot!!, and more than an average user would 
use). 

However, after the crash, I devised a random page disher and set it as my 
home page (http://jrgm.mcom.com/cgi-bin/random-page.pl) where the pages are
the ones above plus slashdot, home.netscape.com, an lxr page, and tinderbox.
With that, I usually hit a crash at about page 9 or 10, with this kind of 
info from 'how', 'hc all', 'ht', and 'sc'.

Bus Error at FFC66426 _InitProcMenu+00246
while reading word from 1A23ABFF in User data space

Checking all heaps
 The System heap at 00002800 is ok
 The ROM read-only heap at 00028F20 is ok
 The Process Manager heap at 01153AB0 is ok
 The ÒNetscape 6Ó heap at 0E176040 is bad
  This block's back pointer doesn't point to the previous block.
 Block header
  0EE3C9D0  0E01 2020 AB00 0004  0000 0090 00A5 79FC  ¥¥  «¥¥¥¥¥¥�¥¥yü
 The ÒShareWay IP Personal BgndÓ heap at 0F26A670 is ok
 The ÒFile Sharing ExtensionÓ heap at 0F339280 is ok
 The heap at 0F376DB0 is ok
 The ÒFinderÓ heap at 0F3936D0 is ok
 The ÒTime SynchronizerÓ heap at 0F496B30 is ok
 The ÒOpen Transport SNMPÓ heap at 0F4BA390 is ok
 The ÒNorton SchedulerÓ heap at 0F52B3F0 is ok
 The ÒFolder ActionsÓ heap at 0F5BC450 is ok
 The ÒFBC Indexing SchedulerÓ heap at 0F62D4B0 is ok
 The ÒControl Strip ExtensionÓ heap at 0F667970 is ok
  System heap high free space + TempMem low free space = #197049584 (#187M)

 Totaling the ÒNetscape 6Ó heap at 0E176040
 The ÒNetscape 6Ó heap at 0E176040 is bad
  This block's back pointer doesn't point to the previous block.
 Block header
  0EE3C9D0  0E01 2020 AB00 0004  0000 0090 00A5 79FC  ¥¥  «¥¥¥¥¥¥�¥¥yü

 Calling chain using A6/R1 links
  Back chain  ISA  Caller

(for comparison, I can do the same test on win32 and linux for >20 pages, by
which time the Mem size of netscape is > 96MB, and I have put the OS into 
paging thrash.). 

[Sidenote: I think the Mac new window performance is better than Windows and 
Linux when showing the new chrome window! Way to go.].
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
What killed you there is heap corruption, not simple out of memory problems. I 
filed a new bug, bug 54023, on the heap corruption problem, and am going to close 
this bug again.
Status: REOPENED → RESOLVED
Closed: 24 years ago24 years ago
Resolution: --- → FIXED
Brade and other still see low memory crashers. I see that it's possible to get 
into this state when nothing can be flushed from imglib caches. More tuning 
required here.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
setting to nsbeta3-, and rtm+

it's a crash, which can and should be resolved for rtm
Keywords: rtm
Whiteboard: [nsbeta3+][p:2][dogfood-][Mac issue][PDTP2] → [nsbeta3-][rtm+][p:2][dogfood-][Mac issue][PDTP2]
Target Milestone: M18 → M19
Simon, please include the required information per the rtm checkin rules
Whiteboard: [nsbeta3-][rtm+][p:2][dogfood-][Mac issue][PDTP2] → [nsbeta3-][rtm+ NEED INFO][p:2][dogfood-][Mac issue][PDTP2]
What's killing us here are the additional handles created for each NewGWorld 
call. Each one creates 29 handles. When allocating a 1-bit GWorld (e.g. for a GIF 
mask), one of those handles is 4.6K in size. These go into the heap even if the 
image bits go into temp mem.

So, possible solutions to fixing this:
1. Leave lots of heap space free, and hope that someone does not create enough 
images that they run out.
2. Use Heap zones to ensure that GWorld bits are controllable, and don't fill the 
heap.
3. Don't use GWorlds at all, reverting to the old hand-rolled, new[] buffer 
allocations for image bits.
4. Don't use GWorlds, but create CGrafPorts, PixMapHandles etc by hand.
5. Don't use GWorlds for 1-bit masks, using hand-rolled BitMaps instead (there is 
a technote on how to do this).

Others?
removing + per pdt sw rules
Whiteboard: [nsbeta3-][rtm+ NEED INFO][p:2][dogfood-][Mac issue][PDTP2] → [nsbeta3-][rtm NEED INFO][p:2][dogfood-][Mac issue][PDTP2]
Are we crashing inside NewGWorld()? If not, there is no reason to remove it. If 
we do, let's make sure we have enough free mem before calling it.
There is plenty of reason to not use GWorlds for images, and hand-roll PixMaps 
instead (as dcone's old code used to do) -- it would save 2-4K of heap data per 
image.

Where the crash happens can vary; it can be any Toolbox call that allocates 
memory in the heap. In debug builds it's often in TextEdit while outputting to 
the console, but it could be in NewRgn or NewGWorld.

Right now, if we detect insufficient memory in the heap, we call NewGWorld with 
the useTempMem flag. But this still puts 29 small handles in the heap -- only the 
pixels go into temp mem. Maybe a cheap solution to this bug would be to simply 
return an out of memory error in this case, and not attempt to allocate the 
GWorld at all.
I attach some diffs that I think are the safest way to fix this now.

mozilla/gfx/src/mac/nsDrawingSurfaceMac.cpp:
Change some constants so that we go to temp mem for drawing surface allocations 
(of which there are few, and which are normally big) earlier, leaving more heap 
mem free. Go to temp mem if < 1024k (max) or 512k (contiguous) is free.

mozilla/gfx/src/mac/nsImageMac.cpp
Similarly to the above patch, I changed some constants so that we go to temp mem 
earlier (1024k max, 768k contig). However, I also changed the logic so that we'll 
acutally refuse to allocate at all if memory is below a lowet threshold (512k 
max, 384k contig). This ensure that we never totally fill the heap with the heap-
based handles that GWorlds create (even when allocating the pixels of the GWorld 
in temp mem).

It is safe to allocate GWorlds; these failure paths have been tested, and imglib 
recovers safely. If the images that fail to allocate are chrome images, you may 
see missing images in the UI. However, this is preferable to crashing.

Longer term, I think we need to go back to hand-rolling PixMaps for image data.
s/lowet/lower
s/is safe to allocate/is safe to fail to allocate
r=pierre

sr=dcone
change rtm NEED INFO to rtm+
Keywords: nsbeta3
Whiteboard: [nsbeta3-][rtm NEED INFO][p:2][dogfood-][Mac issue][PDTP2] → [rtm+][p:2][dogfood-][Mac issue][PDTP2]
PDT says rtm++, please land on branch ASAP.
Whiteboard: [rtm+][p:2][dogfood-][Mac issue][PDTP2] → [rtm++][p:2][dogfood-][Mac issue][PDTP2]
Probably too late in the game, but Mac Mozilla Installer build 2000101808 blew
up in the manner John Morrison mentioned.

When I crashed, I had perhaps 10 navigator windows open, one of them to
http://www.apple.com/ which AFAIK has a QuickTime movie on it.  I crashed so
hard, MacsBug wouldn't even give me a stdlog -- all I got was this:

 MacsBug 6.6.3, Copyright Apple Computer, Inc. 1981-2000

Bus Error at FFC6640A _DeleteMenu+0008A
while reading long word from 4F430000 in User data space

  18-Oct-2000 9:48:44 PM (since boot = 18 minutes)
  Current application is ?Mozilla?
  Machine = #406 (NewWorldMac), System $0904, sysu = $01008000
  Unable to access that address
  PC: FFC66424
  Frame Type: B008
 Log file already open

Dunno if this helps, but thought I might as well post anyway.  
Fix checked into trunk and branch. I spun off bug 57327 for going back to using 
PixMaps here.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago24 years ago
Resolution: --- → FIXED
I can't find any setting of the Mac application heap that will let my debug build 

of Mozilla load its chrome. I am very tempted to back out this change to 

nsImageMac.cpp. If I revert to v1.30, things work.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---
beard: Could you give more details?  I have a fresh debug build from tonight that 
runs perfectly well with the different chromes and within the default 16597 Kb 
partition (VM off).
beard: use ZoneRanger to look at your heap. Why is yours different?
beard: are you running the Boehm GC, or MJR stuff, or something else that eats 
heap space?
Beard was running with the GC stuff, which never called MaxApplZone(). He's gonna 
fix that.
Status: REOPENED → RESOLVED
Closed: 24 years ago24 years ago
Resolution: --- → FIXED
Say, Simon, et al. Given that this bug has been reopened 5 times, and with 
that other memory bug (low pref size) floating around, I don't want to confuse 
matters by doing another reopen.

Performance in this regard is mightily improved. But by the arbitrary test of 
loading from a pallette of heavy web pages (dished from my startup URL), while 
also having mail, composer, etc. open -- at the point where I went to get my 
45th browser window open, I hit an unmapped memory exception with the stack in 
a call to a factory (in CSS Style somewhere). 

Should I reopen this bug or should I file a new bug for sometime after RTM
(or would you prefer to tell me "Don't do that ...").
I certainly did not test as thoroughly as you did but I don't crash anymore.  
However, and this is a memory-related problem too, I hit constantly bug 57835 / 
bug 58490 where the chrome falls apart after opening a few windows of different 
types.

If we still have a little bit of time to spent on low-mem problems, I think it 
would be better to focus on the problems with the chrome (apparently fixed today, 
in the builds tomorrow) than crashes that require dozens of browser windows.
Oh, absolutely -- if we reopened this bug, it would be for post-RTM work. 
There's no "flip one switch" at hand, I don't think, and the "missing chrome" 
is a far more tangible bug, than some idiot (me) opening 45 browser windows :-]

At any rate, this is fine for the branch, so I'm marking it vtrunk, and 
we can pick it up later, or open a brand-new shiny bug for later.
Keywords: vtrunk
jrgp: please open a separate bug with your CSS-related crash, with MacsBug log is 
availabule. Thanks.
Product: Browser → Seamonkey
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: