204143 - crash, orkinHeap::Alloc uses new and throws on failure, oom

Reporter

Description

•

21 years ago

I believe i ran out of memory which BeOS was willing to give to mozilla

stack:
__throw
__builtin_new
orkinHeap::Alloc
morkZone::zone_new_hunk
morkZone::zone_grow_at
morkZone::zone_new_chip
morkZone::ZoneNewChip
morkPool::NewFarBookAtomCopy
...
I believe the line that failed was:
  91   void* block = ::operator new(inSize);
which means bug 149032 wasn't enough


> smfr: ok, so.. can you help me? :)
<smfr> timeless: change all new() to new(std::nothrow) ?
> oh joy
<smfr> that needs doing anyway
<smfr> NS_NEWNOTHROW

This looks useful: http://cpptips.hyperformix.com/cpptips/std_no_throw
in this specific case it's
#include <new> /*...*/
void* block = ::operator new(inSize, std::nothrow);

macro land coming up...

Myk Melez [:myk] [@mykmelez]

Updated

•

20 years ago

Product: Browser → Seamonkey

Serge Gautherie (:sgautherie)

Updated

•

16 years ago

Assignee: timeless → nobody

QA Contact: granrosebugs → build-config

Philip Chee

Updated

•

15 years ago

Product: SeaMonkey → Core

Whiteboard: CLOSEME?

timeless

Reporter

Comment 1

•

15 years ago

we haven't really resolved this, it's still hit all over. it might be one of the older bugs on the subject and it actually has the right starting point. that said, someone else will probably use some other bug to fix the problem (new throws an exception for oom).

(not currently active) Ted Mielczarek

Comment 2

•

15 years ago

Can someone triage this to a mork component so I don't have to get bugmail about it, at least?

David :Bienvenu

Updated

•

15 years ago

Component: Build Config → Database

Product: Core → MailNews Core

QA Contact: build-config → database

Joshua Cranmer [:jcranmer]

Comment 3

•

15 years ago

I doubt this will really get fixed, given that mork is on its way out...

Whiteboard: CLOSEME?

David Bradley

Comment 4

•

15 years ago

Odd that you should comment on this bug. While tracking down crash reports for Songbird last week. I noticed a bunch of crashes on RaiseException out of new, from various parts of XUL, XPConnect, and other places. I haven't had a chance to check it out, but is XULRunner now building with exceptions on? So did exceptions get turned on a while back?d Maybe with the jemalloc code?

Benjamin Smedberg

Comment 5

•

15 years ago

No. But ::operator new is not compiled by mozilla, it's provided by the CRT. It's possible that it will still throw regardless of whether exception handling is on in Mozilla or not.

timeless

Reporter

Comment 6

•

15 years ago

ted: why bother, we're now getting these crashes everywhere, a few bug reports a week (bug 481401, bug 477571). we could move it to xpcom if you prefer, or build config ;-b

jcranmer: this really isn't specific to mork, and it does need to be fixed, i believe this bug is probably one of the (or even my) oldest reports on the problem

dbradley: we switched compilers a while back actually, when we moved from vc71 to vc8, these exceptions had to happen.

as for why we're getting more oom crashes than we used to (as we've had vc8 for a while), i dunno, most likely CC is failing to retain memory or something is leaking (as one of the other bugs complains).

in the end we either have to try hacking jemalloc to do something evil

timeless

Reporter

Comment 7

•

15 years ago

...in the end we either have to try hacking jemalloc to do something evil
(changing how operator new works in the crt), or we have to introduce something
like the macro described by smfr in comment 0.

David Bradley

Comment 8

•

15 years ago

Apparently post VC6, new's implementation changed to always throw. So at least now all the platforms are in the same boat I think.

So either new(std::nothrow) or set a new handler and do "something" when out of memory. But yes, this really isn't specific to mork anymore.

Wayne Mery (:wsmwk)

Comment 9

•

15 years ago

has signature morphed to arena_malloc_small, i.e. bp-e1fc3334-c858-4968-8f47-0e45d2091124 bug 530044?

or one of these others?
http://crash-stats.mozilla.com/query/query?product=Thunderbird&version=ALL%3AALL&date=&range_value=5&range_unit=days&query_search=stack&query_type=contains&query=orkinHeap%3A%3AAlloc&do_query=1

OS: BeOS → All

Summary: orkinHeap::Alloc uses new and throws on failure → crash, orkinHeap::Alloc uses new and throws on failure, oom

Wayne Mery (:wsmwk)

Comment 11

•

13 years ago

same as the following...?
operator new(unsigned int) | orkinHeap::Alloc(nsIMdbEnv*, unsigned int, void**)
bp-a37b5722-061f-4050-a780-73ea12110117
Unhandled C++ Exception
0x778cfbae
0	kernel32.dll	RaiseException	
1	mozcrt19.dll	_CxxThrowException	throw.cpp:159
2	mozcrt19.dll	operator new	objdir-tb/mozilla/memory/jemalloc/src/new.cpp:57
3	thunderbird.exe	orkinHeap::Alloc	db/mork/src/orkinHeap.cpp:90
4	thunderbird.exe	morkNode::MakeNew	db/mork/src/morkDeque.cpp:62
5	thunderbird.exe	morkFactory::OpenFileStore	db/mork/src/morkFactory.cpp:545
6	thunderbird.exe	nsMsgDatabase::OpenMDB	mailnews/db/msgdb/src/nsMsgDatabase.cpp:1140
7	thunderbird.exe	nsMsgDatabase::Open	mailnews/db/msgdb/src/nsMsgDatabase.cpp:1026
8	thunderbird.exe	nsMailDatabase::Open	mailnews/db/msgdb/src/nsMailDatabase.cpp:111
9	thunderbird.exe	nsMsgDBService::OpenFolderDB	mailnews/db/msgdb/src/nsMsgDatabase.cpp:147

Wayne Mery (:wsmwk)

Comment 12

•

13 years ago

(In reply to comment #8)
> Apparently post VC6, new's implementation changed to always throw. So at
> least now all the platforms are in the same boat I think.
> 
> So either new(std::nothrow) or set a new handler and do "something" when out
> of memory. But yes, this really isn't specific to mork anymore.


[@ morkZone::zone_new_hunk] also?
example
bp-dfac1ee0-3fb0-4c30-b94d-253e52110526
SIGSEGV
0x0
0	thunderbird-bin	morkZone::zone_new_hunk	morkZone.cpp:266
1	thunderbird-bin	morkZone::zone_grow_at	morkZone.cpp:242
2	thunderbird-bin	morkZone::zone_new_chip	morkZone.cpp:319
3	thunderbird-bin	morkZone::ZoneNewRun	morkZone.cpp:398
4	thunderbird-bin	morkPool::NewCells	morkPool.cpp:275
5	thunderbird-bin	morkPool::AddRowCells	morkPool.cpp:321
6	thunderbird-bin	morkRow::NewCell	morkRow.cpp:453
7	thunderbird-bin	morkRow::AddColumn	morkRow.cpp:885
8	thunderbird-bin	morkRowObject::AddColumn	morkRowObject.cpp:267
9	thunderbird-bin	nsAddrDatabase::AddCharStringColumn	nsAddrDatabase.cpp:2182

Crash Signature: [@ operator new(unsigned int) | orkinHeap::Alloc(nsIMdbEnv*, unsigned int, void**) ] [@ morkZone::zone_new_hunk]

Wayne Mery (:wsmwk)

Comment 13

•

11 years ago

current crash sig looks to be moz_abort | arena_run_split | arena_run_alloc | arena_malloc | je_malloc | orkinHeap::Alloc(nsIMdbEnv*, unsigned int, void**)

morkZone::zone_new_hunk ends after TB16.0.2

Crash Signature: [@ operator new(unsigned int) | orkinHeap::Alloc(nsIMdbEnv*, unsigned int, void**) ] [@ morkZone::zone_new_hunk] → [@ operator new(unsigned int) | orkinHeap::Alloc(nsIMdbEnv*, unsigned int, void**) ] [@ morkZone::zone_new_hunk] [@ moz_abort | arena_run_split | arena_run_alloc | arena_malloc | je_malloc | orkinHeap::Alloc(nsIMdbEnv*, unsigned int, void**)]

Nik B.

Comment 14

•

11 years ago

As I had mentioned https://bugzilla.mozilla.org/show_bug.cgi?id=533313 I hit this issue because the process hit the 2GB per-process hard limit (I was on a 32-bit system at the time) meaning the system couldn't allocate more memory even if it did have plenty of memory to spare. 

There's really not much of anything to do to work around that, except reduce memory usage and/or warn the user that Thunderbird is out of address space. I guess you could "delay" the issue if the user is running a system with a 3G/1G user-kernel split, but those users are rare... On 64-bit systems, a 64-bit version of TBird would have access to more virtual address making this a non-issue for all practical purposes.

Wayne Mery (:wsmwk)

Updated

•

11 years ago

Robert Kaiser

Comment 15

•

11 years ago

One problem could be that mork loads the whole database into memory, and so can OOM on huge databases. If we'd replace it with a reasonable, more modern DB that doesn't need to put everything into memory, we'd probably be better there.

Wayne Mery (:wsmwk)

Comment 16

•

11 years ago

(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #15)
> One problem could be that mork loads the whole database into memory, and so
> can OOM on huge databases. If we'd replace it with a reasonable, more modern
> DB that doesn't need to put everything into memory, we'd probably be better
> there.

I believe it would be a mistake to assume big db is the majority of cases. And since Bug 723248 - add support for closing inactive databases - it should be difficult more to see runnaway open dbs.  But for sure many crashes are db related. the index db ranges from 2-20% of folder size (depends on what's in headers and whether user has custom headers defined in TB).  So for example if user had many 4gb folders, 30 databases times 100 MB per index (just over 2%) that's 3GB. But if everything is working properly there should not be 30 db files open.

In my wanderings of the bugs, crash reports and user support reports most OOM issues tie to
lightning
selecting thousands of messages (known flaw)
corrupt index
large numbers of contacts
google contacts sync

And I think to a lesser extent
mork issues
corrupt panacea.dat (perhaps mork related)

but recently we do see reports of runaway open databases counts, even Bug 723248 defaults of 
mail.db.idle_limit 30000
mail.db.max_open 30

Robert Kaiser

Comment 17

•

11 years ago

Of course, if we're using regular expressions, we might run into something like bug 837845, but open DB counts sound like something different.
That said, as the platform is seeing a ton of churn with JS improvements etc. and Firefox not having any mork or RDF stuff like mailnews still has for some things, and AFAIK neither Thunderbird nor SeaMonkey having any reasonable regular pref/memory testing/monitoring, it's pretty much possible that our code triggers some bad behavior in those platform changes, or the other way round, and we take a long time to notice it.

Wayne Mery (:wsmwk)

Comment 18

•

11 years ago

re comment 17, absolutely correct on all counts.  And one of our primary fears about TB now releasing essentially only ESR is our ability to detect most regressions becomes (theoretically) as long as a year.  because of our smallish beta and alpha user base and the items you cited.

for the record, we do have a few reported cases of more than 30 databases being open, despite bug Bug 723248 - so it's not necessary to have huge databases to cause GBs of memory usage in such cases.

And, I often forget to ask this of users, but addons like thunderbrowse can of course cause issues to arise that would normally only be seen in browser-land.

Wayne Mery (:wsmwk)

Comment 19

•

11 years ago

https://getsatisfaction.com/mozilla_messaging/topics/thunderbird_is_crashing_after_5_seconds_in_safe_mode_or_reg_mode may be a current example

Ludovic Hirlimann [:Usul]

Comment 20

•

9 years ago

Removing myslef on all the bugs I'm cced on. Please NI me if you need something on MailNews Core bugs from me.

BMO Automation

Updated

•

9 years ago

Wayne Mery (:wsmwk)

Comment 21

•

8 years ago

This may have morphed to something like https://crash-stats.mozilla.com/search/?signature=~morkZone&product=Thunderbird&_sort=-date&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#crash-reports but the crash rates are low (probably much lower than when the bug was filed), I don't see anything that seems actionable

Status: NEW → RESOLVED

Closed: 8 years ago

Depends on: 149032

Resolution: --- → WORKSFORME