Closed Bug 637695 Opened 14 years ago Closed 14 years ago

selftest basics testLocksNotHeld failure on 64-bit Mac

Categories

(Tamarin Graveyard :: Garbage Collection (mmGC), defect)

x86_64
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: pnkfelix, Assigned: pnkfelix)

References

Details

(Whiteboard: has-patch)

Attachments

(5 files)

% ../configure.py --target=x86_64-darwin --enable-shell --enable-selftest % make -j4 % ./shell/avmshell -Dversion shell 1.4 release build 6011:de1f25d17b81 features AVMSYSTEM_64BIT;AVMSYSTEM_UNALIGNED_INT_ACCESS;AVMSYSTEM_UNALIGNED_FP_ACCESS;AVMSYSTEM_LITTLE_ENDIAN;AVMSYSTEM_AMD64;AVMSYSTEM_MAC;AVMFEATURE_JIT;AVMFEATURE_ABC_INTERP;AVMFEATURE_SELFTEST;AVMFEATURE_EVAL;AVMFEATURE_PROTECT_JITMEM;AVMFEATURE_SHARED_GCHEAP;AVMFEATURE_CACHE_GQCN;AVMFEATURE_SAFEPOINTS;AVMFEATURE_SWF12;AVMFEATURE_SWF13;AVMTWEAK_EXACT_TRACING; % ./shell/avmshell -Dselftest=mmgc,basics,lockObject ['start', 'mmgc', 'basics'] ['test', 'mmgc', 'basics', 'lockObject'] ['failure', 'mmgc', 'basics', 'lockObject', 'LockerAndUnlocker::testLocksNotHeld(gc)', 'ST_mmgc_basics.st', 360] ['end', 'mmgc', 'basics']
See Also: → 589172
This could be a problem of overly conservative retention, don't know yet. I'll poke at it today.
Assignee: nobody → fklockii
It is looking like conservative retention. My latest lightbulb moment was to leverage treilly's blacklist facility, so that the selftest is able to provide more information about the cases where it fails to reclaim the floating garbage. I'm running into some difficulties there, will have more information soon.
changeset: 6013:bf94f88a73ef user: Felix S Klock II <fklockii@adobe.com> summary: Bug 637695: short-term fix to buildbot results: disable false-failure (r=fklockii). As noted in comment added to selftest itself: // Conservative retention appears to be foiling the verify call below // Put back after identifying and eliminating the source of such // significant rentention (if possible). (I've been working on the latter, but decided it won't be done in time and other engineers should not be getting build failure messages.) http://hg.mozilla.org/tamarin-redux/rev/bf94f88a73ef
The selftest's structure is analogous to below (where the terminology "locks held" can be read as "objects retained, as intended"): create and lock objects perform a full collection test locks held, accessing the locked objects in the process perform a full collection unlock the locks perform a full collection test locks not held and thus objects reclaimed From inspecting the generated assembly, the conservative retention arises because: 1. All of the code above is made up of a series of relatively small function calls that all get inlined, 2. The short 10 iteration loops are completely unrolled and the results of the locked object accesses are assigned their own individual temporaries, and 3. The temporaries are spilled onto the C stack frame of the test itself. The first item can be semi-addressed by fixing Bug 569361 (something that has been on my back-burner for a while). That would probably also address item 3. But since the NO_INLINE directive suggested by Bug 569361 would only serve as a hint rather than a guarantee, it would be good to address Item 2 as a backup measure. Item 2 can probably be addressed by increasing the load of the test from the relatively small 10 objects to something more substantial that compilers won't unroll.
When you turn the MMGC_HEAP_GRAPH #define back on, you can make observations about the heap structure. This patch leverages that by blacklisting the objects right before we unlock them for the last time (which is when they should become unreachable). That way, if any are conservatively retained, we get notification as to why. (The paths included in that output is how I realized that so many of the objects were from distinct stack slots, which is what led to my realization that the loops were being unrolled.) This patch is optional, but since it has the right set of ifdef-guards, I don't know why we would not include it.
precursor for the suggested fix of increasing the object load.
Increasing the object load from 10 to 100 seems to stop gcc from unrolling the loop (or at least from completely unrolling it; a partial blockage should be enough to ensure that the stack won't keep everything alive).
Comment on attachment 516316 [details] [diff] [review] step1. turn falsely-failing selftest back on (this is trivial; I just figure I should set "r?" on precedents for the interesting patch)
Attachment #516316 - Flags: review?(lhansen)
Comment on attachment 516320 [details] [diff] [review] step2. proactive feedback on why test fails. (optional) I would not land the "#define MMGC_HEAP_GRAPH" in MMgc.h, obviously. But would you argue against landing the rest of these changes to the selftest? (There are probably other selftests where a similar pattern could be usefully applied, so setting a precedent one way or another is good.)
Attachment #516320 - Flags: review?(lhansen)
Comment on attachment 516321 [details] [diff] [review] step3. refactor the load (number of objects) into one definition an easy refactoring setting the stage for step4, which has the real fix.
Attachment #516321 - Flags: review?(lhansen)
Comment on attachment 516324 [details] [diff] [review] step4. increase load from 10 objects to 100 the most effective fix I found. (I'll post some other potential patches to the ticket as well in a few minutes, but this is the one that I think is the right one to land, and the others can probably be discarded.)
Attachment #516324 - Flags: review?(lhansen)
this approach (hinting that the functions should not be inlined) can be applied independently of step4 (increasing the load to avoid loop-unrolling). See comment 4 for some explanation of this. I suspect step4 will suffice on its own. But I thought I should include this patch on the ticket for completeness.
Depends on: 637992
Whiteboard: has-patch
Attachment #516316 - Flags: review?(lhansen) → review+
Comment on attachment 516320 [details] [diff] [review] step2. proactive feedback on why test fails. (optional) Did you mean for the -gcefficiency switch to be included here? I'm guessing not.
Attachment #516320 - Flags: review?(lhansen) → review+
Comment on attachment 516321 [details] [diff] [review] step3. refactor the load (number of objects) into one definition Always when one writes some splendidly unmaintainable code some busybody do-gooder comes along and cleans it up... ;-)
Attachment #516321 - Flags: review?(lhansen) → review+
Attachment #516324 - Flags: review?(lhansen) → review+
(In reply to comment #14) > Comment on attachment 516320 [details] [diff] [review] > step2. proactive feedback on why test fails. (optional) > > Did you mean for the -gcefficiency switch to be included here? I'm guessing > not. Good catch, the gcefficiency switch addition was an accident.
changeset - 6033:082dadfebe48 summary: Bug 637695: selftest basics testLocksNotHeld failure on 64-bit Mac (r=lhansen). http://hg.mozilla.org/tamarin-redux/rev/082dadfebe48
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: