Closed Bug 364175 Opened 18 years ago Closed 16 years ago

[OS/2] always crash after extended uptime

Categories

(SeaMonkey :: General, defect)

SeaMonkey 1.1 Branch
x86
OS/2
defect
Not set
critical

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: mrmazda, Unassigned)

Details

(Keywords: crash)

Mozilla/5.0 (OS/2; U; Warp 4.5; en-US; rv:1.8.1.1) Gecko/20061215 SeaMonkey/1.1
This has been going on a very long time, at least 2 months, but probably more like 4 months or longer.

To reproduce:
1-open browser
2-open at least 15 tabs, preferably 20 or more, including some with big buglists
3-open mailnews
4-open CZ to 12 or more tabs on one or more servers
5-surf normally
6-send and receive lots of mail (500 or more average per day)
7-read newsgroups on several servers

Actual behavior:
1-crash after 24 or more hours uptime

Expected behavior:
2-no crash before at least 5 days uptime

Important note:
Peter's highmem enabled 1.1 builds do not suffer this problem

Additional notes:
1-I got about 50 hours uptime out of the last start, but that began on a Friday afternoon, and so both CZ and mail and browser usage was somewhat less than normal. Crash can happen any time after about 24 hours uptime with more typical usage.
2-RAM consumption is generally upwards of 300M out of total real RAM of 512M when crash occurs.
3-Before this started I could run up to about 5 days uptime before exhausting real RAM and still not crash, which I can still do with highmem enabled builds.
I don't think this is something that we can address in the 1.8 branch any more. There are many places in the Mozilla code where failed memory allocation will cause crashes and checking if memory allocation worked basically depends on bug 353144.
(I tried to address this in one place by bug 351943 but that is really not the way to go...)

What would help is to track where the most RAM allocation happens (but I don't really know how). Then I could take a look at why that is and maybe stop it. Btw, is bug 224661 still up-to-date?
(In reply to comment #1)
> I don't think this is something that we can address in the 1.8 branch any more.

Seems to me highmem enabling would be sufficient. Yesterday's crash caused me to waste 4 hours trying to recover from email corruption that followed from forgetting to restore the fixed xpcomct.dll after restoring the latest available highmem build.

> There are many places in the Mozilla code where failed memory allocation will
> cause crashes and checking if memory allocation worked basically depends on bug
> 353144.

All I see there is "You are not authorized to access bug #353144".

> Btw, is bug 224661 still up-to-date?
 
I've not seen anything to indicate otherwise.
Oh, yes. It's marked as security sensitive. Don't know why.

Highmem on the 1.8 branch is not going to happen. The changes needed are far to big and spread over too many parts of the code and build config... But I will release my unofficial SM 1.1 final before Xmas to include it.
(In reply to comment #3)
> Oh, yes. It's marked as security sensitive. Don't know why.

So how do we find out what it's about?
 
> Highmem on the 1.8 branch is not going to happen. The changes needed are far to
> big and spread over too many parts of the code and build config... But I will
> release my unofficial SM 1.1 final before Xmas to include it.

I really don't understand this response. You and others have said highmem for 1.8 can't be done, yet, you do it in your own build. If you can do it there, why can't it be done by default?
(In reply to comment #4)
> So how do we find out what it's about?

It's a technicality of compiler settings in C++. If you really want to read it, you should get somebody from the security group to CC you. Not sure if I am allowed to do that but I asked to remove the security flag.

> I really don't understand this response. You and others have said highmem for
> 1.8 can't be done, yet, you do it in your own build. If you can do it there,
> why can't it be done by default?

Well, it can be done in principle but it takes hours and hours to adapt the patches, get approvals and check them in. I don't have time like that (and I think nobody else does, either).
Mozilla/5.0 (OS/2; U; Warp 4.5; en-US; rv:1.8.1.2pre) Gecko/20070131 SeaMonkey/1.1 and other recent normal builds continue this problem, but I did manage to get 71 hours and 56 minutes out of 0131. It crashed all by itself after about 1.5 hours "idle" (CZ running, biff working, background tabs with JS), while I wasn't even here. Peter's 1.1 highmem indeed does not suffer the problem, but it has at least one email problem since fixed in branch nightlies, forcing me to choose between the least objectionable type of crashes.
Mozilla/5.0 (OS/2; U; Warp 4.5; en-US; rv:1.8.1.3pre) Gecko/20070302 SeaMonkey/1.1.1

I wasn't paying attention to uptime and got about 66 hours before crash on this one. I was in the middle of composing a plain text email.
Mozilla/5.0 (OS/2; U; Warp 4.5; en-US; rv:1.8.1.4pre) Gecko/20070417 SeaMonkey/1.1.1 crashed (doing exactly nothing in browser, only CZ "active") after exactly 48 hours and 1 minute uptime
Yesterday's lasted only 24 hours and 21 minutes, and crashed while I was sleeping.
Next run crashed after 27 hours middle clicking a link to open a Geocities frames page in a new tab.
It's interesting to see you keep track of the browser uptimes, but I don't think this helps at all to solve the problem. Are we even sure that this is related to the 512MB-RAM-per-process-limit? If not, you should perhaps run the Theseus memory monitor with some auto-update time and from every now and then have a look at it. Maybe you can determine some process RAM limit where this happens.

To be honest, I don't think we will ever be able to exactly determine and fix the problem, despite your efforts to describe the problem well. So we could just as well mark this WONTFIX...
Browser uptimes are easy to track. Before restart, I just look at the timestamp on XUL.mfl and compare to crash time. It takes 10-20 minutes to restart, because I generally have 20 or more tabs open and have to figure out what they all were to restore them.

The problem doesn't exist in Weilbacher HIGHMEM enabled builds, which means "not fixable" and WONTFIX make little sense.

The past two shorter than average periods included much higher than normal open time, and more average open tabs, for FF, meaning much less available real RAM.

Theseus I know nothing about.
If it is a RAM related problem that you see then the highmem builds will most likely have it, too. You will just encounter it much later. I am sorry to say this again, but I won't have time to go through the review/approval process for the highmem patch for branch. If you can do that or find somebody who does it, I would be happy to do the final checkins.
Much later may be an understatement. I just closed the highmem-enabled eSeaMonkey 1.1.2 after 5 days 0 hours 0 minutes uptime. That's my longest uptime by far since testing the highmem-enabled 1.1.1 several months ago, and the only one that lasted more than 60 or so hours.

RAM recovered was 458M. Lack of free RAM was why I closed it. It's turned into an even bigger memory hog than the memory hog it used to be. :-p
I replaced my 2 256M RAM sticks with 2 1024M RAM sticks. I was at 58 hours uptime with nightly rv:1.8.1.5pre 20070703 when instead of a crash, the UI got corrupted with black patches. Then on switching to the CZ window, I found all output pane fonts had disappeared or been replaced by underscores. Thus useless I had to close, recovering just short of 300M doing so. Several hours earlier I encountered a previously observed instance of a png file loaded in a tab too small for it to fit, which became fully visible only after several reloads.
I got a whopping 15 hours out of my first run of the standard (no HIGHMEM) 1.1.5 release.
You will be thrilled to hear that SM on trunk grows so quickly that even with highmem I only get around 15-30 _minutes_ out of that before it slows down my system.
Besides some smaller leaks in SM mail my idea of the day is that kLibc is not very clever about re-using memory that it already freed. But I have a lot more debugging to do before I can make sure that this is really the case...
While investigating where all the memory on trunk goes (without success), I stumbled on a few bugs that seem relevant:
- There is bug 130157 which seems almost impossible to fix inside
  Mozilla (bug 130157 comment 50 has a good explanation of what happens).
  We certainly cannot attempt an OS/2-only solution.
- And as "lots of mail" was mentioned in this bug, bug 400589 seems
  especially relevant. Others that might be related are mentioned therein.
Overall, I still think this is WONTFIX or INVALID because in a general sense we will never get the data needed to fix this. We can only attack specific problems, and those are unlikely to be OS/2 specific.
The nightly 1.1.8pre build from less than 24 hours ago managed 11 hours and 1 minute before closing itself two hours after I went to sleep. So, I'm back to 20071129 SeaMonkey/1.1.7 (PmW), which like the PmW releases preceding it, runs as long as I let it. I did make one change since last testing a branch nightly, adding VIRTUALADDRESSLIMIT=1536 to CONFIG.SYS.
Severity: major → critical
Version: 1.8 Branch → SeaMonkey 1.1 Branch
Gecko 1.8 is basically dead and highmem is on by default for everything past that. As we aren't having enough information to determine the underlying cause, I'm just going to resolve this incomplete.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.