Closed Bug 481683 Opened 16 years ago Closed 15 years ago

Various bugs in MMgc

Categories

(Tamarin Graveyard :: Garbage Collection (mmGC), defect)

Product:

Component:

Platform:

x86

macOS

Type:

defect

Priority:

Not set

Severity:

normal

Tracking

(Not tracked)

Status:

VERIFIED FIXED

Milestone:

flash10.1

People

(Reporter: lhansen, Unassigned)

References

Details

Attachments

(1 file)

Documentation and compilation bugs in GCHeap 16 years ago Lars T Hansen 11.19 KB, patch		Details \| Diff \| Splinter Review

Reporter

Description

•

16 years ago

As I'm undertaking an audit of the MMgc code I'm finding odds and ends that are wrong or at least askew. I'll be attaching patches to this bug as I uncover things; feel free to merge or not. Eventually I expect to file various proper bugs. Some of the problems I uncover may block the March release, I'll try to note it if they do.

Reporter

Comment 1

•

16 years ago

Attached patch Documentation and compilation bugs in GCHeap — Details — Splinter Review

Notes a bunch of documentation bugs. Fixes compilation for !virtual memory on MacOS X, though leaves the general case unsolved (none of the other platforms will compile for this case). IMO the compilation problem should block the March release.

Reporter

Updated

•

16 years ago

Blocks: 478870

Comment 2

•

16 years ago

Sweet, glad to have another pair of eye balls on the code! It needs it. General comments: I've had a recent epiphany that we should make enable the !virtual memory code on all systems and make it a runtime switch. The idea is that Mac OS X 10.4 would use the !vm code since its so broken, then the vm code gets simpler and cleaner since 10.5/win/linux/solaris are pretty similar and we know the !vm code is always tested/compiling since we run the sanities on 10.4. There is one direct client of GCHeap::Alloc, the JIT

Reporter

Comment 3

•

16 years ago

There's probably an OOM handling bug in ExpandHeapLocked, near the end. Here's the code: if (newRegion == NULL) { // Ugh, FUBAR. return false; } This code is probably wrong. All earlier code in this function makes a point of unreserving / decommitting if something fails and we have to back out. Here we return if we can't allocate the memory for a Region, but we're leaving behind a new block array that points to heap blocks for the new region, but no region will map to those blocks. At a minimum this is a storage leak (large allocation fails but the client handles the error return from PleaseAlloc, and now the memory in the blocks array is sitting unused). Might be better to try to allocate the Region object before committing the new blocks array and deleting the old one; this would allow the allocating code to back out properly.

Reporter

Comment 4

•

16 years ago

(In reply to comment #2) > I've had a recent epiphany that we should make enable the !virtual memory code > on all systems and make it a runtime switch. The idea is that Mac OS X 10.4 > would use the !vm code since its so broken, then the vm code gets simpler and > cleaner since 10.5/win/linux/solaris are pretty similar and we know the !vm > code is always tested/compiling since we run the sanities on 10.4. At a minimum it seems like a good idea to make it possible to have both paths available in a build. Don't know why you would require it, if you could avoid it, though. I like what you suggest re: testing. If we don't end up doing that we still need to start testing non-virtual memory builds.

Reporter

Comment 5

•

16 years ago

MMgc::DebugSize() should be inline in some header file, since the value it returns is a constant. Making it inline would reduce code size overall.

Reporter

Comment 6

•

16 years ago

In FixedAlloc::Alloc, this is presumably a violation of the function's API: if(!m_firstFree) { if(CreateChunk() == NULL) { GCAssertMsg(false, "Out of memory!"); return NULL; } } Instead of returning NULL this should probably enter the OOM handling system? So far as I can tell, no other paths of Alloc return NULL.

Comment 7

•

16 years ago

OOM handling happens in GCHeap, ie callers of GCHeap::AllocBlock won't have to deal with NULL returns so this becomes dead code.

Reporter

Comment 8

•

16 years ago

Makes sense. Nobody else checks the return value from CreateChunk, should probably just be made 'void' and then the code becomes as you suggest: if(!m_firstFree) CreateChunk();

Reporter

Comment 9

•

16 years ago

From the looks of it the Unix code for GetPerformanceFrequency should not have shipped, and there's anecdotal evidence that the collection frequency on Linux can be far lower than on other platforms, leading to much larger heaps. On memory-limited systems, especially when the out-of-memory handling is "abort & go home" and panic-collecting on OOM may not hold off the abort, response time must be secondary to staying within memory bounds. (I'd argue that that is true on desktop too and that the memory bounds on desktop should just be looser.)

Reporter

Comment 10

•

16 years ago

FixedMalloc::Alloc: inline void* Alloc(size_t size) { void *item; GCAssert(size + 3 > size); // overflow detection if(size+3 < size) return NULL; I assume the NULL return is a contract violation? Nobody else seems to return NULL for anything.

Reporter

Comment 11

•

16 years ago

FixedMalloc::Alloc again (different issue): inline void* Alloc(size_t size) { void *item; GCAssert(size + 3 > size); // overflow detection if(size+3 < size) return NULL; size = (size+3)&~3; This rounds up to a four-byte boundary. Should this be an eight-byte boundary on 64-bit systems?

Reporter

Comment 12

•

16 years ago

FixedMalloc::Alloc: GCHeap::Alloc never returns NULL, so simplify this code. void *item = m_heap->Alloc(blocksNeeded, true, false); if(!item) { GCAssertMsg(item != NULL, "Large allocation failed!"); }

Reporter

Comment 13

•

16 years ago

At GCHashtable.cpp:228 and GCMemoryProfiler.cpp:117 'malloc' is being called directly, despite the fact that the objects that own the calling methods are subtypes of GCAllocObject, which implements a 'new' operator that forwards to 'malloc' generally (except on Windows). The calls to 'malloc' are correctly paired with calls to 'free' so the code isn't buggy per se but this struck me as weird, since the GCAllocObject mechanism exists to handle just this type of allocation. (Also it should be VMPI_malloc and VMPI_free, possibly, but then there's no guarantee that those aren't pointing back to FixedMalloc, so maybe not :-)

Reporter

Comment 14

•

16 years ago

The files MMgc.cpp, mmgc_stdint.h, and GCGlobalNew.cpp are all empty - delete them?

Reporter

Comment 15

•

16 years ago

The file TypeTraits.h appears to be unused and related to some form of advanced C++ that we can't make use of - delete it?

Comment 16

•

16 years ago

I think so, the AIR team snuck it in

Comment 17

•

16 years ago

I think the GCHashtable and GCMemoryProfiler should use the VMPI functions, I assume those will always be the system allocator and should only ever be used where we explicitly want to avoid our allocator (I think the new GCAllocObject uses them no?)

Reporter

Comment 18

•

16 years ago

(In reply to comment #17) > I think the GCHashtable and GCMemoryProfiler should use the VMPI functions, I > assume those will always be the system allocator and should only ever be used > where we explicitly want to avoid our allocator (I think the new GCAllocObject > uses them no?) That makes sense, but are there good reasons not just to use 'new' and 'delete' and not muck around with VMPI_malloc at all? Why does OPTION_MALLOC exist? On all platforms except Windows it has no effect relative to using new (because new just redirects to malloc). On Windows it makes GCHashtable go to malloc instead of through GCAllocObject's 'new', which uses HeapAlloc. Does this have performance implications, or implications for measuring heap population, or other effects?

Comment 19

•

16 years ago

This stuff exists solely to prevent the memory profiler from profiling itself. If we use VMPI_malloc then the malloc problem goes away. We can't use new b/c we are allocating raw buffers not GCAllocObject subclasses (new char[] would go to FixedMalloc).

Comment 20

•

16 years ago

(In reply to comment #15) > The file TypeTraits.h appears to be unused and related to some form of advanced > C++ that we can't make use of - delete it? It's used by the avmplus List<> type to infer the type of the receiver object.

Reporter

Comment 21

•

16 years ago

(In reply to comment #20) > (In reply to comment #15) > > The file TypeTraits.h appears to be unused ... > It's used by the avmplus List<> type to infer the type of the receiver object. So it is, my search missed that. The comment in the head says that it's not portable because not all C++ compilers implement what it needs. What should we do about that?

Reporter

Comment 22

•

16 years ago

The MMgc code uses 'int' in various debugging contexts and many places makes the explicit assumption that 'int' is 32 bits. This ain't necessarily so, and it would be better to switch to 'int32_t' if that's what's intended, as I suspect it is. Two items that come to mind. In RCObject where a field 'padto32bytes' is declared 'int'; the fact that it 'knows' that the object is 28 bytes and that int is four bytes is sort of scary; that it isn't checked anywhere (eg with an assert) is a bug that should be fixed. It's clearly not true on 64-bit systems because the constituent GCStack contains pointer fields. In GCMemoryProfiler.h, GetRealPointer knows that the real pointer is the object pointer minus the size of two ints. Why int, and not int32_t? Anyhow the fact that it is int is used in GCWeakRef.h, since it picks up the first of those ints in a debug mode. A fixed-size type would be better here. There are other, 'unsigned int' has been seen too.

Comment 23

•

16 years ago

(In reply to comment #22) > the fact that it 'knows' that the object is 28 bytes and > that int is four bytes is sort of scary we have a compile-time assertion macro; we should be using it in cases like this. (agreed about your meta-point though; naked "int" must be expunged)

Reporter

Comment 24

•

16 years ago

In RCObject::IncrementRef and DecrementRef we find this guard at the beginning: if(Sticky() || composite == 0) return; Faster (or easier to optimize) would probably be if((composite & ~STICKY_FLAG) == 0) return;

Reporter

Comment 25

•

16 years ago

In RCObject: void setZCTIndex(uint32_t index) { GCAssert(index < (ZCT_INDEX>>8)); GCAssert(index < ZCT_INDEX>>8); composite = (composite&~ZCT_INDEX) | ((index<<8)|ZCTFLAG); } According to K&R bit shift has higher precedence than the relational operators so one of the two assertions is redundant, possibly a merge error.

Reporter

Comment 26

•

16 years ago

There's a bug in how off-page bitmaps are managed during shutdown of GCAlloc. The code is this: GCAlloc::~GCAlloc() { while (m_firstBlock) { if(((uintptr_t)m_firstBlock->bits & 0xfff) == 0) m_gc->GetGCHeap()->Free(m_firstBlock->bits); ... GCBlock *b = m_firstBlock; UnlinkChunk(b); FreeChunk(b); } } FreeChunk zeroes the off-page bitmap for the chunk, but the memory that it zeroes may have been freed by the first statement in the loop seen above. Since the freed block may have been unmapped the VM might crash in FreeChunk when the bits are cleared. Even if that does not happen, FreeChunk then frees the bits by calling GC::FreeBits, and since the bits are not on an allocated page the result is presumably an inconsistent free list in the bit manager.

Reporter

Comment 27

•

16 years ago

In GCAlloc::CreateChunk there is a test for whether GC::AllocBlock returned NULL, but that can't happen. Ergo CreateChunk never returns NULL. CreateChunk is called two places, one of them checks for a NULL return, but that can't happen.

Comment 28

•

16 years ago

RE: ~GCAlloc, nice one! GCHeap doesn't have any checking that a block was written to after being freed (it should) and Free never decommits a block. There's another bug here in that the data for "bits" never shrinks. The GC doesn't even know about those pages, all it maintains is a set of size segrated freelist pointers. This is why the Free for the whole bit block happens in ~GCAlloc. Lots of room for improvement here.

Reporter

Comment 29

•

16 years ago

(In reply to comment #28) > ... Free never decommits a block. True, but... Free calls FreeBlock which calls RemoveBlock which calls RemoveRegion which calls ReleaseMemory which calls unmap -- but only if the block that's being freed is the only block in a region. That does not currently happen because ExpandHeapLocked will not expand the heap with less than a minimum number of blocks. For the very smallest systems we may have to revisit that policy though. So a minor point to keep in mind. > There's another bug here in that the data for "bits" never shrinks. Thanks, good to know. It seems to me that the off-page bitmap is a good idea in principle but will benefit only systems where there is a significant amount of paging as what it does is concentrate writes onto a small number of pages; that only matters if those pages are to be written out again. And for a number of object sizes the writes will be on-page anyhow. (There might be some CPU cache effects, but with marking going on the cache is probably not very happy in any case.) Are there other benefits? A voice at the back of my head says "concurrency" but I don't trust it.

Reporter

Comment 30

•

16 years ago

Spot the bug (in GCHeap::ExpandHeap). I've reindented etc to make it easier. retval = ExpandHeapLocked(askSize); if (!retval && status == kNormal) { StatusChangeNotify(kNormal, kReserve); retval = ExpandHeapLocked(askSize); if (!retval) status = kReserve; } else status = kEmpty;

Reporter

Comment 31

•

16 years ago

Not that ExpandHeap is called much at all, everyone seems to prefer to call ExpandHeapLocked directly.

Comment 32

•

16 years ago

status gets set to kEmpty on a successful expansion, untested un-used code at the moment.

Reporter

Comment 33

•

16 years ago

Another one: uint64_t GC::GetPerformanceCounter() { ... #elif defined(AVMPLUS_UNIX) struct timeval tv; ::gettimeofday(&tv, NULL); uint64_t seconds = (uint64_t)(tv.tv_sec * 1000000); ... #endif } As tv.tv_sec is a time_t and time_t is normally 'long', the multiplication will normally overflow (after about 00:40 in the morning). The cast needs to apply to tv.tv_sec. This may explain the wacky collection frequency on Unix discussed in comment #9.

Reporter

Comment 34

•

16 years ago

Re comment #26, it appears that the GC takes care of freeing the blocks on the bits free list, but currently it does this before destroying the GCAllocs, which is presumably why the code is needed in ~GCAlloc. If the GC were to free the blocks on the bits free list after GCAlloc destruction then GCAlloc would not have to worry about it.

Reporter

Comment 35

•

16 years ago

(In reply to comment #33) > As tv.tv_sec is a time_t and time_t is normally 'long', the multiplication will > normally overflow (after about 00:40 in the morning). Nevermind the 'after about 00:40 in the morning', I was imagining something :-) The function as it stands is buggy but I'm not sure how much that matters, as the cast to uint64_t yields a sequence of increasing values, spaced 1M units apart if sampled once a second... At certain unfortunate moments that sequence will behave strangely because of the overflow - about every 2100 seconds or so, I guess, when the multiplication flips the number from positive to negative or vice versa and there's a significant drop in the 64-bit value.

Reporter

Comment 36

•

16 years ago

Here's a simulation of the sequence of values at the time of one of those drops: ffffffffffcec8c0 ffffffffffde0b00 ffffffffffed4d40 fffffffffffc8f80 bd1c0 1b1400 2a5640 399880 48dac0 581d00 675f40 76a180 The bug could have the impact of both making the system unresponsive (incremental marking gets to go on too long) and disabling collections for a long time (burst mode detection goes awry).

Reporter

Comment 37

•

16 years ago

This code near the end of GC::Sweep is mysterious: if(heap->Config().gcstats) { int sweepResults = 0; GCAlloc::GCBlock *bb = smallEmptyPageList; while(bb) ... GCLargeAlloc::LargeBlock *lbb = largeEmptyPageList; while(lbb) ... } smallEmptyPageList and largeEmptyPageList were cleared earlier in Sweep(), following a call to Finalize(). So far as I can tell the only way they could be refilled is by another call to Finalize(). Yet Finalize() is private to the GC and only called from that one point earlier in Sweep(). So I suspect the above code is ineffective.

Comment 38

•

16 years ago

Indeed I dont recall writing code to interate over those list twice, especially when we whack the list after the first iteration. I think the sweepResults++ should be in the first iteration.

Updated

•

16 years ago

Blocks: 481413

Updated

•

16 years ago

No longer blocks: 478870

Reporter

Comment 39

•

16 years ago

There's a bunch of portability problems that snuck in previously: naked fprintf calls in several files, presumably naked use of stdio in general.

Updated

•

16 years ago

Depends on: 485966

Updated

•

16 years ago

Depends on: 485961

Updated

•

16 years ago

Depends on: 485963

Updated

•

16 years ago

Depends on: 485964

Updated

•

16 years ago

No longer depends on: 485964

Updated

•

16 years ago

Depends on: 485964

Updated

•

16 years ago

Depends on: 485967

Updated

•

16 years ago

Depends on: 485971

Updated

•

16 years ago

Depends on: 485972

Comment 41

•

16 years ago

New Item: VMPI_alloc() isn't hooked up to OOM system. Return values from VMPI_alloc are not checked for NULL pointer. I think we need an alloc abstraction in MMgc that is used by rest of the code and in turn calls VMPI_alloc and throws if it returns NULL. What about overloaded new operators in GCAllocObject? What if VMPI_alloc() within them returned NULL? The caller of new doesn't check for NULL condition. The overloaded new should invoke abstracted alloc function as well. What about alloca/VMPI_alloca()? That could fail too.

Reporter

Updated

•

16 years ago

No longer blocks: 481413

Reporter

Comment 42

•

16 years ago

(In reply to comment #41) > New Item: > > VMPI_alloc() isn't hooked up to OOM system. Foo, you're right. IMO the 'new' and 'delete' operators in GCAllocObject need to be banished, and it needs to be very clear indeed when the underlying allocator is being used for allocations like these. That would introduce enough hygiene for us to start talking about it. (Standard trick is placement new + dedicated allocator abstraction, of course.) After that it becomes a question of finding out when these allocations can take place and whether a failure can upset an in-progress garbage collection; that must not be allowed to happen, because it means that a GC trying to prevent an oom condition may not run to completion.

Reporter

Comment 43

•

16 years ago

(In reply to comment #41) > What about alloca/VMPI_alloca()? That could fail too. Failure of alloca/VMPI_alloca should be treated as a stack overflow - an exception should always be thrown. But you're right, that's another work item. (Good thing we don't call naked alloca any more; VMPI_alloca can have a never-fail API.)

Reporter

Updated

•

16 years ago

Depends on: 480984

Comment 44

•

16 years ago

(In reply to comment #43) > (Good thing we don't call naked alloca any more; VMPI_alloca can have a > never-fail API.) Not completely true. MMgc code has calls to naked alloca at few places.

Reporter

Comment 45

•

16 years ago

(In reply to comment #44) > (In reply to comment #43) > > > (Good thing we don't call naked alloca any more; VMPI_alloca can have a > > never-fail API.) > > Not completely true. MMgc code has calls to naked alloca at few places. Then it needs to be fixed :-)

Updated

•

16 years ago

Depends on: 487104

Reporter

Comment 46

•

15 years ago

"Resolved" since all problems have been logged as separate bugs.

Status: NEW → RESOLVED

Closed: 15 years ago

Resolution: --- → FIXED

Updated

•

15 years ago

Target Milestone: --- → flash10.x

Updated

•

15 years ago

Status: RESOLVED → VERIFIED

You need to log in before you can comment on or make changes to this bug.