Closed
Bug 525875
Opened 16 years ago
Closed 15 years ago
static GC write barriers broken (GC::GetGC is brittle)
Categories
(Tamarin Graveyard :: Garbage Collection (mmGC), defect, P1)
Tamarin Graveyard
Garbage Collection (mmGC)
Tracking
(Not tracked)
RESOLVED
FIXED
Q3 11 - Serrano
People
(Reporter: treilly, Assigned: treilly)
References
Details
Attachments
(1 file, 4 obsolete files)
15.06 KB,
patch
|
pnkfelix
:
review+
lhansen
:
superreview+
|
Details | Diff | Splinter Review |
/* static */ void GC::WriteBarrierRC(const void *address, const void *value)
/* static */ void GC::WriteBarrierRC_ctor(const void *address, const void *value)
/* static */ void GC::WriteBarrierRC_dtor(const void *address)
These all use GC::GetGC on address but address could be to the second page of a GC large alloc and will return garbage. We've fixed this before, likely a optimization regression, need self test!
Assignee | ||
Updated•16 years ago
|
Assignee: nobody → treilly
Severity: normal → major
Status: NEW → ASSIGNED
Priority: -- → P1
Target Milestone: --- → flash10.1
Comment 1•16 years ago
|
||
I didn't know that GetGC() was capable of barfing in this way, had thought it was valid for all gc-allocated objects. Is it possible to add a debug assertion at least?
Comment 2•16 years ago
|
||
This breakage is mine too, BTW, mea culpa. But can we add a big fat comment to GetGC() warning about this?
Assignee | ||
Comment 3•16 years ago
|
||
I still have to go back and see how it worked before, this issue came up in AIR where a C++ ScriptObject subclass (where the C++ declared slots come after the AS3 slots) declares WB protected fields can occur in the second page or a large allocation, not sure if this particular case is still relevant.
Assignee | ||
Comment 4•16 years ago
|
||
This appears broken in FP9 as well. I think maybe the "fix" was to eliminate uses of WriteBarrier<T> when it was possible to be located on a secondary page of a large alloc, that seems fragile. At a minimum we'll get some asserts in here to catch that case.
Assignee | ||
Comment 5•16 years ago
|
||
I can't think of a good way to fix this relying on GCHeap::GetActiveGC involves thread local memory which could be a lot slower. If used properly there's no issues and the assert should catch improper use.
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Resolution: --- → WONTFIX
Comment 6•16 years ago
|
||
Reopening because it's a major issue we need to track, see bug #528977. It just bit me and at a minimum we need asserts to catch problems (I just got a bus error, even in a debug build).
Status: RESOLVED → REOPENED
OS: Mac OS X → All
Priority: P1 → --
Hardware: x86 → All
Resolution: WONTFIX → ---
Target Milestone: flash10.1 → ---
Updated•16 years ago
|
Status: REOPENED → NEW
Summary: static GC write barriers broken → static GC write barriers broken (GC::GetGC is brittle)
Updated•16 years ago
|
Assignee: treilly → nobody
Target Milestone: --- → Future
Comment 7•16 years ago
|
||
If GetGC is this fragile, we have to either fix it or nuke it. Nuking may be hard due to ubiquitous usage of DWB and friends, but...
Comment 8•16 years ago
|
||
(In reply to comment #7)
> If GetGC is this fragile, we have to either fix it or nuke it. Nuking may be
> hard due to ubiquitous usage of DWB and friends, but...
I agree, but the obvious fix is costly, namely, look up the page in the page table, check to see if it's a not-first large-object page, if it is iterate backward one page at a time until we find the beginning. That's much worse than masking off the lower 12 bits even in the common case where the pointer points to a small-object page or the first page of a large object.
I think the proper fix here is a card-marking write barrier. (That's desirable for other reasons too.)
Comment 9•16 years ago
|
||
(In reply to comment #8)
> I agree, but the obvious fix is costly
broken record: if the code doesn't have to be correct, we can make it arbitrarily fast...
Comment 10•16 years ago
|
||
(In reply to comment #9)
> (In reply to comment #8)
> > I agree, but the obvious fix is costly
>
> broken record: if the code doesn't have to be correct, we can make it
> arbitrarily fast...
Yes...
However, in this case the code is correct - from a certain point of view, which says that (a) write barrier performance is extremely important and (b) explicit write barriers using WB() and WB_NULL() don't normally occur at large offsets from the object start and hence we shouldn't slow down the system as a whole to support this edge case.
Assignee | ||
Updated•15 years ago
|
Assignee: nobody → treilly
Priority: -- → P1
Target Milestone: Future → flash10.x - Serrano
Assignee | ||
Comment 13•15 years ago
|
||
If this flies there will be follow opportunities to clean up places in the core we work around this bug.
Attachment #482862 -
Flags: superreview?(lhansen)
Attachment #482862 -
Flags: review?(fklockii)
Comment 14•15 years ago
|
||
Its an interesting idea. It relies on the write barrier code firing before the actual assignment takes place (as opposed to doing the assignment and then doing WB work afterward); but this is fine
You might adopt terms like "old_val" and "new_val" for comments and code rather than saying e.g. "thing at address" and "value"
Question: Can a null value contain distinct tags? If so, do they matter? Because you bomb out of the entire write if both the old_val and the new_val, once stripped of their tags, are null, but what if they are differently tagged null values?
Comment 15•15 years ago
|
||
(In reply to comment #14)
> Question: Can a null value contain distinct tags? If so, do they matter?
> Because you bomb out of the entire write if both the old_val and the new_val,
> once stripped of their tags, are null, but what if they are differently tagged
> null values?
even if this can come up, one potential "fix" is easy: just do the assignment itself before returning from GC::WriteBarrierRC. (So its really a question of whether this can happen, or if it would instead be a sign of a bug somewhere else and thus we could instead assert that the tags, if present, match.)
Assignee | ||
Comment 16•15 years ago
|
||
(In reply to comment #14)
> Its an interesting idea. It relies on the write barrier code firing before the
> actual assignment takes place (as opposed to doing the assignment and then
> doing WB work afterward); but this is fine
Yeah, this only covers the substitutions barriers used from manual write barriers and the smart pointers. If we got rid of manual write barriers we'd only have the smart pointer write barriers and the non-substitution barrier used by the JIT and we could probably remove some layers.
> You might adopt terms like "old_val" and "new_val" for comments and code rather
> than saying e.g. "thing at address" and "value"
Sure.
> Question: Can a null value contain distinct tags? If so, do they matter?
> Because you bomb out of the entire write if both the old_val and the new_val,
> once stripped of their tags, are null, but what if they are differently tagged
> null values?
The GC's policy has always been to ignore tags, there are at least 4 or 5 pointer tagging schemes used in MMgc client code. You raise a good point that a write barrier could have been used to change a NULL tag and we should either allow that or assert when it happens. I'll restructure the patch to allow it since I don't want to spend too much time on this now.
Assignee | ||
Comment 17•15 years ago
|
||
Comment on attachment 482862 [details] [diff] [review]
Fixes the bug using value at address or value but not address
cancelling review, needs reworking due to Felix's comments and I also realized I broke my write barrier edge graph with this change!
Attachment #482862 -
Flags: superreview?(lhansen)
Attachment #482862 -
Flags: review?(fklockii)
Assignee | ||
Comment 18•15 years ago
|
||
I'm going to still work on this as my hot item, I think fixing this will clean up a lot of stuff that will grease the skids for landing safegc (by enabling manual write barriers to go away in the player at least, we may still want to use manual wb's in the core for performance, ie to avoid find beginning and gc calculation).
Assignee | ||
Comment 19•15 years ago
|
||
Attachment #482862 -
Attachment is obsolete: true
Attachment #482986 -
Flags: review?
Assignee | ||
Updated•15 years ago
|
Attachment #482986 -
Flags: superreview?(lhansen)
Attachment #482986 -
Flags: review?(fklockii)
Attachment #482986 -
Flags: review?
Comment 20•15 years ago
|
||
Just to be clear - because this is a large change to very sensitive code: Is this change being driven by the SafeGC work? Are there alternatives to this change?
Comment 21•15 years ago
|
||
Comment on attachment 482986 [details] [diff] [review]
Take two, always have writes take place so corner cases like changing the ptr bits work
R- for 2 (fixable) reasons:
1. How can I tell, from the header documentation, what the difference is between WriteBarrierWriteRC and StaticInlineWriteBarrierRC? (Note that both are static and inline). Please revise text to distinguish them in some way.
2. In GC::WriteBarrierWriteRC: Where did the GCAssert(IsPointerIntoGCObject(address));" go? I don't see it in the calling contexts. Was it supposed to be in GC::WriteBarrierWrite? Or has the assertion been invalidated by other aspects of your change (and if so, is that sane?)?
Additional feedback (things that I would like addressed but would not R- if they were left undone):
Your earlier patch was somewhat easier to digest. The changes might be easier to validate if you first made a pure refactoring (no change to the semantics of how gc is extracted) that adds the new helpers like WriteBarrierSansWrite and the control-flow modifications, checked that in, and checked in the narrower change that introduces and uses GetGCFromAddressOrValue separately.
From looking at StaticInlineWriteBarrierRC's calls to WriteBarrierSansWrite and WriteBarrierWriteRC, as well as the code for the latter two, there is RC barrier code executed by WriteBarrierWriteRC that is not covered by WriteBarrierSansWrite. I found this decomposition a little surprising at first (as in, why is there barrier effort that is not included in WriteBarrierSansWrite).
The latter two points could probably be handled in tandem by devising appropriate terminology, in particular naming the two kinds of barrier work: barrier work that applies across all GCObjects, and additional barrier work that applies only to RCObjects. (Then just say e.g. WriteBarrierWriteRC assumes that the GC-barrier-work has already fired and it is taking care of the RC-barrier-work.)
Attachment #482986 -
Flags: review?(fklockii) → review-
Comment 22•15 years ago
|
||
(In reply to comment #21)
> The latter two points could probably be handled in tandem [...]
This sentence from the final paragraph did not make sense because I reordered my points but forgot to revise the last paragraph.
It should have been written as "The first point (1. above) and the last point (both about StaticInlineWriteBarrierRC and WriteBarrierWriteRC) could probably be handled in tandem [...]"
Comment 23•15 years ago
|
||
(In reply to comment #20)
> Just to be clear - because this is a large change to very sensitive code: Is
> this change being driven by the SafeGC work? Are there alternatives to this
> change?
Lars: Are you worried about the fundamental technique (that is, using GetGC(*address) rather than GetGC(address) ?) Or are you okay with that idea (which is working around GetGC's brittleness), and concerned about other aspects of this patch?
Comment 24•15 years ago
|
||
I'm concerned because the write barrier is performance-sensitive and this code appears (at first blush - I've not examined it carefully yet) to introduce more decisions into it, and that could translate into a slowdown.
Assignee | ||
Comment 25•15 years ago
|
||
(In reply to comment #20)
> Just to be clear - because this is a large change to very sensitive code: Is
> this change being driven by the SafeGC work? Are there alternatives to this
> change?
The alternatives are to keep using manual write barriers where we are to work around this issue and to keep ourselves open to additional pain/crashiness because of this. Particularly I'm worried about the avmglue classes and someone adding a new field and things blowing up on 64 bit and not getting caught.
Assignee | ||
Comment 26•15 years ago
|
||
(In reply to comment #24)
> I'm concerned because the write barrier is performance-sensitive and this code
> appears (at first blush - I've not examined it carefully yet) to introduce more
> decisions into it, and that could translate into a slowdown.
I'd planned to examine some release code to make sure all the inlining is working properly. There's some changes here that makes some bits faster and some that might slow things down as well. I'll get some perf #'s, do you know of any tests that are particularly WB perf sensitive?
Comment 27•15 years ago
|
||
v8.5/optimized/richards.abc hits WB's pretty hard.
Assignee | ||
Comment 28•15 years ago
|
||
Comment on attachment 482986 [details] [diff] [review]
Take two, always have writes take place so corner cases like changing the ptr bits work
Cancelling review, will update with a) feedback from Felix b) performance data and c) some simple prose describing the problem, solution and approach.
Attachment #482986 -
Attachment is obsolete: true
Attachment #482986 -
Flags: superreview?(lhansen)
Assignee | ||
Comment 29•15 years ago
|
||
The big change here is that we use the incoming value in the static write barriers to get the GC, not the address. This mean if the incoming value is NULL no write barrier happens. But this is only for the smart pointer case, the non-static write barriers haven't changed at all.
Attachment #484705 -
Flags: superreview?(lhansen)
Attachment #484705 -
Flags: review?(fklockii)
Assignee | ||
Comment 30•15 years ago
|
||
perf results are pretty much a wash as expected.
tommy-reillys-mac-pro:performance treilly$ ./runtests.py -i 5 --avm2 ~/builds/tamarin-redux/release/shell/avmshell --avm /tmp/avmshell v8.5/optimized/
Tamarin tests started: 2010-10-20 08:49:40.859647
Executing 9 test(s)
avm: /tmp/avmshell version: cyclone
avm2: /Users/treilly/builds/tamarin-redux/release/shell/avmshell version: cyclone
iterations: 5
avm avm2
test best avg best avg %dBst %dAvg
Metric: v8 custom v8 normalized metric (hardcoded in the test)
Dir: v8.5/optimized/
crypto 3709 3704 3707 3698.6 -0.05 -0.15
deltablue 2753 2744.6 2771 2765 0.65 0.74 +
earley-boyer 920 914.2 915 911 -0.54 -0.35
raytrace 6960 6952 6867 6862 -1.34 -1.29 -
regexp 61.5 61.3 61 60.9 -0.81 -0.68 -
richards 3424 3417.8 3451 3444 0.79 0.77 +
splay 4894 4801.4 4908 4894.6 0.29 1.94
tommy-reillys-mac-pro:performance treilly$ ./runtests.py -i 5 --avm2 ~/builds/tamarin-redux/release/shell/avmshell --avm /tmp/avmshell jsbench/typed
Tamarin tests started: 2010-10-20 08:51:53.644573
Executing 10 test(s)
avm: /tmp/avmshell version: cyclone
avm2: /Users/treilly/builds/tamarin-redux/release/shell/avmshell version: cyclone
iterations: 5
avm avm2
test best avg best avg %dBst %dAvg
Metric: time
Dir: jsbench/typed/
Crypt 1010 1011 1009 1009.8 0.10 0.12
Euler 10160 10225.6 9995 10027 1.62 1.94 +
FFT 2601 2603.8 2603 2611.4 -0.08 -0.29
HeapSort 1676 1679.2 1677 1679.4 -0.06 -0.01
LUFact 2730 2730.6 2700 2700 1.10 1.12 +
Moldyn 4569 4581 4518 4533.4 1.12 1.04 +
RayTracer 1486 1488.8 1492 1494.2 -0.40 -0.36 -
SOR 8005 8010.2 8011 8016.2 -0.07 -0.07
Series 8756 8774.4 8713 8735.4 0.49 0.44 +
SparseMatmult 2447 2451.8 2444 2450.2 0.12 0.07
Comment 31•15 years ago
|
||
Comment on attachment 484705 [details] [diff] [review]
Don't rely on GetGC
R-: I haven't gone too deep into it; because my first issue from my earlier R- was not addressed: "How can I tell, from the header documentation, what the difference is between WriteBarrierWriteRC and StaticInlineWriteBarrierRC?"
(If someone convinces me that I'm wrong about making a fuss about this, I'll revert back to R? and do the rest of the review.)
Let me elaborate, since I'm worried I did not state the issue clearly enough the first time around:
The doc for the 1st one says: "Do a WriteBarrierWrite and take care of reference count updating."
The doc for the 2nd one says: "Perform the actual store of value into *address, adjusting reference counts."
(note that the doc for WriteBarrierWrite says that it performs the actual store of value into *address, so that is not a difference between the two. The actual difference is not mentioned at all in the docs.)
The actual difference between these two is that StaticInlineWriteBarrierRC is doing something extra in addition to what WriteBarrierWriteRC does. But this is not indicated at all in the docs, so one cannot determine which one is the right one to invoke without looking at their implementations.
Since we already seem to be okay with describing behavior by referring to other functions, why not describe StaticInlineWriteBarrierRC as: "composition of WriteBarrierCheckedTrap and WriteBarrierWriteRC?" True, its just barely abstracted from the code itself, but at least it provides information that distinguishes StaticInlineWriteBarrierRC from WriteBarrierWriteRC; a property the current docs cannot claim.
If the group decides that referring to other functions in the docs like that is too ugly, then I repeat my suggestion that we devise terminology to distinguish between RC and non-RC write-barrier work.
(Maybe the real issue here is that our current naming convention acts like "if the RC suffix is missing, that means it is nonRC stuff -- but this patch illustrates the problem with that convention: you cannot distinguish the composition of (RC + nonRC) material from RC material alone.)
Attachment #484705 -
Flags: review?(fklockii) → review-
Assignee | ||
Comment 32•15 years ago
|
||
Fair enough, if I'm gonna change this much I might as well clean up the docs.
Assignee | ||
Updated•15 years ago
|
Attachment #484705 -
Attachment is obsolete: true
Attachment #484705 -
Flags: superreview?(lhansen)
Assignee | ||
Comment 33•15 years ago
|
||
Sorry for the run-around Felix, hopefully this is moving in the write (yuck yuck) direction.
Attachment #484752 -
Flags: review?(fklockii)
Comment 34•15 years ago
|
||
Comment on attachment 484752 [details] [diff] [review]
don't rely on GetGC
R-:
1. As I mentioned in comment 21, the following two assertions have been removed from WriteField (formerly known as WriteBarrierWrite):
GCAssert(!IsRCObjectSafe(value));
GCAssert(IsPointerIntoGCObject(address));
and I do not see the second assertion in the WriteField-calling contexts. I do not understand why it has been removed from WriteField. Can you please address this, either in the code or in this ticket?
2. The new doc above WriteBarrer{,RC,RC_ctor} says "The non-RC form must not be passed an RCObject value and an RCObject must not exist at address. Similarly the RC forms only accept RCObjects." This makes it sound like you can only have reference arcs RC->RC and nonRC->nonRC, ie that there is no way to have an arc from an RC->nonRC or nonRC->RC; but that is an absurd conclusion. The problem ( I think) is the last sentence in the doc; you can't just say "Similarly [...]" with such strong constraints from the preceding sentence (the last sentence refers to RCObjects (plural), which can easily be taken as a constraint on both value and *address). Change the last sentence to say directly what the actual constraints are for *address and value, without reference to the previous sentence.
(Writing good docs is hard!)
3. WriteFieldRC's header doc refers to "WriteBarrierWrite"; replace with "WriteField".
Attachment #484752 -
Flags: review?(fklockii) → review-
Assignee | ||
Comment 35•15 years ago
|
||
I'll fix the assertion, nice catch.
>This makes it sound like you can only have reference arcs RC->RC and nonRC->nonRC, ie that there is
> no way to have an arc from an RC->nonRC or nonRC->RC; but that is an absurd conclusion.
I don't buy this, the comments say nothing about the object address points into. I explicitly say "value" and "at address". I'll try make it clearer.
> which can easily be taken as a constraint on both value and *address).
And it should, the contraints ARE on both value and *address. Maybe that's why you're confused, hopefully fixing the missing asserts so they match the docs will make it clearer.
Comment 36•15 years ago
|
||
(In reply to comment #35)
> >This makes it sound like you can only have reference arcs RC->RC and nonRC->nonRC, ie that there is
> > no way to have an arc from an RC->nonRC or nonRC->RC; but that is an absurd conclusion.
>
> I don't buy this, the comments say nothing about the object address points
> into. I explicitly say "value" and "at address". I'll try make it clearer.
Ugh, I fell victim to confusion between referrer and referent.
1. I grant, I might have been less confused if it had said "at *address" rather than "at address."
2. I suspect the docs as written are still wrong for the same reason that I gave above: they describe a constraint in the second-to-last that is too strong to apply to the last sentence.
Consider: you are saying in the second-to-last sentence that it is incorrect to use the non-RC form to overwrite an RC at *address. Are you also trying to say in the last sentence that it is incorrect to use the RC form to overwrite a nonRC at *address? This is bogus, yes? (And that is why it is wrong to use "Similarly [...]"
Comment 37•15 years ago
|
||
(In reply to comment #36)
> in the last sentence that it is incorrect to use the RC form to overwrite a
> nonRC at *address? This is bogus, yes?
Never mind, I jumped the gun: it is *not* bogus. The constraint is entirely sensible.
Comment 38•15 years ago
|
||
Comment on attachment 484752 [details] [diff] [review]
don't rely on GetGC
Okay, so just to summarize: I retract my second objection from comment 34. (I interpreted "an RCOjbect must not exist at address" as being a constraint on the object containing address, rather than a constraint on the object referred to by *address.)
Tommy already acknowledged the first point, and the third point is trivial to fix. I'll switch to R+ now since I think I won't need to re-review once Tommy addresses points 1 and 3.
Attachment #484752 -
Flags: review- → review+
Comment 39•15 years ago
|
||
Comment on attachment 484752 [details] [diff] [review]
don't rely on GetGC
retracting R+; I changed my mind again. I want to re-review because I short-cut my last review and I want to make sure I give the next revision a proper review so that Lars can be confident during his SR.
Attachment #484752 -
Flags: review+
Assignee | ||
Comment 40•15 years ago
|
||
The plot thickens, doing full non-RC asserts if running a foul the manual WB in MultinameHashtable where a WB is used on value's that are "Binding"s. I think a Binding can be a GC pointer or a abc offset. Will have to get to the bottom of that. The old code only asserted that value wasn't an rcobject so it slipped through the cracks.
Assignee | ||
Comment 41•15 years ago
|
||
WE blocker: #2744511
Assignee | ||
Comment 42•15 years ago
|
||
Note that this fix for MMgc::WriteBarrier and MMgc::WriteBarrierRC should be extended to avmplus::AtomWriteBarrier (ie it has the same problem by using GetGC)
Assignee | ||
Updated•15 years ago
|
Blocks: safegc-tracker
Assignee | ||
Updated•15 years ago
|
No longer blocks: safegc-tracker
Comment 43•15 years ago
|
||
As a fix for this bug would simplify the GC API and thus be part of the SafeGC goal, we should continue to look for solutions. No activity for two weeks even if this is a "Major P1" bug. Please update the bug ASAP with current status.
Blocks: 576307
Assignee | ||
Comment 44•15 years ago
|
||
Update: I pushed the blocker and have a patch ready but one of my new asserts fired on win64 so I need to investigate that first.
Assignee | ||
Comment 45•15 years ago
|
||
msvc is adding 8 to pointer values in GCList<E4XNode>::add and subtracting 8 in get. Only 64 bit (haven't tried release) and only for E4XNode apparently. My new write barriers don't like this shenanigans and want a real pointer to the beginning of the object not a derived pointer.
Assignee | ||
Comment 46•15 years ago
|
||
Attachment #484752 -
Attachment is obsolete: true
Attachment #494810 -
Flags: superreview?(lhansen)
Attachment #494810 -
Flags: review?(fklockii)
Comment 47•15 years ago
|
||
(In reply to comment #46)
> Created attachment 494810 [details] [diff] [review]
> rebased passing sandbox w/ dependent fixes (some still pending review)
Question: This code originally used GetGC to extract the GC object by inspecting the data itself. Your earlier solutions to the problem of reliable extraction replaced GetGC with GetGCFromAddressOrValue, so we would still attempt to extract the appropriate GC by inspecting the objects themselves (which the old code attempted to do but was passing in an offset address which was not valid for large objects).
Your new solution does not do this; instead it replaces GetGC with GetActiveGC, which extracts the GC from the execution context rather than inspecting the objects on the heap. Is this guaranteed to behave in the same manner? That is, will the two results be the same GC, and if that is not necessarily so, is there some invariant ensuring that the difference in GC does not matter? (Pointers to relevant documentation welcome.)
Comment 48•15 years ago
|
||
(In reply to comment #47)
> That
> is, will the two results be the same GC, and if that is not necessarily so, is
> there some invariant ensuring that the difference in GC does not matter?
> (Pointers to relevant documentation welcome.)
Quick follow-up: it does seem like it is just assertion code using the extracted GC, and the code is careful to use functions like IsRCObjectSafe. But even those methods, while safe (in the sense that they will not crash or assert), are still using per-gc structure like the pagemap (used in FindBeginningGuarded). So I am a little concerned that the difference in GC could matter, though it seems like it could "at most" introduce false positives and/or negatives in the assertions. . .
Comment 49•15 years ago
|
||
Comment on attachment 494810 [details] [diff] [review]
rebased passing sandbox w/ dependent fixes (some still pending review)
r- until i get answers for comment 47 and comment 48 (at which point I'll continue the review).
Attachment #494810 -
Flags: review?(fklockii) → review-
Assignee | ||
Comment 50•15 years ago
|
||
You answered the question pretty well I think. We have two choices to reliably get the GC, extract it from the value, the value at address or the thread local. The address itself isn't reliable and with this patch we never attempt to get it from there. We want to do assertion checking even if we can't get it from the value (b/c its NULL), hence we use GetActiveGC. My earlier patches also attempted to get the GC from the value at address but I realized that if value was NULL we could skip the write barrier so I just made everything in that case static and used GetActiveGC to perform the debug checking. The extra load of the value at address was unnecessary but more importantly affected performance noticably.
Comment 51•15 years ago
|
||
(In reply to comment #50)
> You answered the question pretty well I think.
I don't think I did. I still don't know:
-- will the two results be the same GC
-- if they are not the same GC, can the assertions yield false passes (which is acceptable to me) or false failures (which is not acceptable).
Assignee | ||
Comment 52•15 years ago
|
||
Yes they should always be the same if the client follows the rules. If they aren't we'll assert/crash.
If they are not the same GC something went horribly wrong, basically it means a MMGC_GCENTER was missing or something whacky happened. Some of the asserts will pass with a different GC but FindBeginning will assert and return NULL since it uses the pagemap, if you continue pass the assert we'll crash.
Comment 53•15 years ago
|
||
(In reply to comment #52)
> Yes they should always be the same if the client follows the rules. If they
> aren't we'll assert/crash.
Hmm. I'd go along with this reasoning, but I noticed that GC::GetGC at one point asserted that gc == GetActiveGC, but Steven commented out that assertion in TR changeset 2999:
http://hg.mozilla.org/tamarin-redux/rev/1e3ad060a02a
So it sounds to me like the assumption is broken somewhere.
Assignee | ||
Comment 54•15 years ago
|
||
That assert is valid we just need to fix the clients to not violate the rules in order to enable it ;-)
Assignee | ||
Comment 55•15 years ago
|
||
Even if turns out we need to allow GetGC to be used when the active GC isn't set, it would probably be wrong to allow a write barrier to fire when the active GC isn't set, if for no other reason simply to enable us to ensure that the write barriers are being used correctly.
Comment 56•15 years ago
|
||
(In reply to comment #54)
> That assert is valid we just need to fix the clients to not violate the rules
> in order to enable it ;-)
Filed as Bug 616477
That bug should not hold up review of this one (since this discussion is contained to assertion code and those should not hold up fixing the real underlying bug here). I'll resume review.
Updated•15 years ago
|
Attachment #494810 -
Flags: review- → review?(fklockii)
Comment 57•15 years ago
|
||
Comment on attachment 494810 [details] [diff] [review]
rebased passing sandbox w/ dependent fixes (some still pending review)
Looks good, now that I grok the basis for the switch to GetActiveGC().
Uber-nit: patch introduces extraneous trailing whitespace immediately above and within GC::InlineWriteBarrierCheckedTrap.
Attachment #494810 -
Flags: review?(fklockii) → review+
Comment 58•15 years ago
|
||
InlineWriteBarrierGuardedTrap would be a better name for what you call InlineWriteBarrierCheckedTrap ("check" normally having the meaning of "throw an error if it doesn't hold").
Speaking of that method, its invariant is weaker than before in the case of RC storage (no tests on the low bits of container and pointer). Not obvious that it matters. Discuss.
I'm not ecstatic about the extra condition introduced into the barrier for the null test but it does seem like the only rational approach; calling GetActiveGC() on the hot path seems less desirable.
Updated•15 years ago
|
Attachment #494810 -
Flags: superreview?(lhansen) → superreview+
Assignee | ||
Comment 59•15 years ago
|
||
Player blocker Watson 2768575 fixed. Will push this today
Assignee | ||
Comment 60•15 years ago
|
||
(In reply to comment #58)
> InlineWriteBarrierGuardedTrap would be a better name for what you call
> InlineWriteBarrierCheckedTrap ("check" normally having the meaning of "throw an
> error if it doesn't hold").
Cool, will go with Guarded.
> Speaking of that method, its invariant is weaker than before in the case of RC
> storage (no tests on the low bits of container and pointer). Not obvious that
> it matters. Discuss.
I cleaned this up so that the address assertion moves to WriteField so all paths assert that now. The container assert isn't necessary because IsPointerToGCObject will cover it, I've removed it.
> I'm not ecstatic about the extra condition introduced into the barrier for the
> null test but it does seem like the only rational approach; calling
> GetActiveGC() on the hot path seems less desirable.
Yeah, its a trade off but a good one I think.
Comment 61•15 years ago
|
||
changeset: 5634:187d01e5fc76
user: Tommy Reilly <treilly@adobe.com>
summary: Bug 525875 - static GC write barriers broken (GC::GetGC is brittle) (r=fklockii,sr=lhansen)
http://hg.mozilla.org/tamarin-redux/rev/187d01e5fc76
Assignee | ||
Updated•15 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 16 years ago → 15 years ago
Resolution: --- → FIXED
Updated•15 years ago
|
Flags: flashplayer-bug+
You need to log in
before you can comment on or make changes to this bug.
Description
•