static GC write barriers broken (GC::GetGC is brittle)

RESOLVED FIXED in Q3 11 - Serrano

Status

Tamarin
Garbage Collection (mmGC)
P1
major
RESOLVED FIXED
9 years ago
7 years ago

People

(Reporter: Tommy Reilly, Assigned: Tommy Reilly)

Tracking

(Blocks: 1 bug)

unspecified
Q3 11 - Serrano
Dependency tree / graph
Bug Flags:
flashplayer-bug +

Details

Attachments

(1 attachment, 4 obsolete attachments)

(Assignee)

Description

9 years ago
/* static */ void GC::WriteBarrierRC(const void *address, const void *value)	
	/* static */ void GC::WriteBarrierRC_ctor(const void *address, const void *value)	
	/* static */ void GC::WriteBarrierRC_dtor(const void *address)

These all use GC::GetGC on address but address could be to the second page of a GC large alloc and will return garbage.   We've fixed this before, likely a optimization regression, need self test!
(Assignee)

Updated

9 years ago
Assignee: nobody → treilly
Severity: normal → major
Status: NEW → ASSIGNED
Priority: -- → P1
Target Milestone: --- → flash10.1

Comment 1

9 years ago
I didn't know that GetGC() was capable of barfing in this way, had thought it was valid for all gc-allocated objects. Is it possible to add a debug assertion at least?

Comment 2

9 years ago
This breakage is mine too, BTW, mea culpa. But can we add a big fat comment to GetGC() warning about this?
(Assignee)

Comment 3

9 years ago
I still have to go back and see how it worked before, this issue came up in AIR where a C++ ScriptObject subclass (where the C++ declared slots come after the AS3 slots) declares WB protected fields can occur in the second page or a large allocation, not sure if this particular case is still relevant.
(Assignee)

Comment 4

9 years ago
This appears broken in FP9 as well.   I think maybe the "fix" was to eliminate uses of WriteBarrier<T> when it was possible to be located on a secondary page of a large alloc, that seems fragile.  At a minimum we'll get some asserts in here to catch that case.
(Assignee)

Comment 5

9 years ago
I can't think of a good way to fix this relying on GCHeap::GetActiveGC involves thread local memory which could be a lot slower.  If used properly there's no issues and the assert should catch improper use.
Status: ASSIGNED → RESOLVED
Last Resolved: 9 years ago
Resolution: --- → WONTFIX

Comment 6

9 years ago
Reopening because it's a major issue we need to track, see bug #528977.  It just bit me and at a minimum we need asserts to catch problems (I just got a bus error, even in a debug build).
Status: RESOLVED → REOPENED
OS: Mac OS X → All
Priority: P1 → --
Hardware: x86 → All
Resolution: WONTFIX → ---
Target Milestone: flash10.1 → ---

Updated

9 years ago
Status: REOPENED → NEW
Summary: static GC write barriers broken → static GC write barriers broken (GC::GetGC is brittle)

Updated

9 years ago
Assignee: treilly → nobody
Target Milestone: --- → Future

Comment 7

9 years ago
If GetGC is this fragile, we have to either fix it or nuke it. Nuking may be hard due to ubiquitous usage of DWB and friends, but...

Comment 8

9 years ago
(In reply to comment #7)
> If GetGC is this fragile, we have to either fix it or nuke it. Nuking may be
> hard due to ubiquitous usage of DWB and friends, but...

I agree, but the obvious fix is costly, namely, look up the page in the page table, check to see if it's a not-first large-object page, if it is iterate backward one page at a time until we find the beginning.  That's much worse than masking off the lower 12 bits even in the common case where the pointer points to a small-object page or the first page of a large object.

I think the proper fix here is a card-marking write barrier.  (That's desirable for other reasons too.)

Comment 9

9 years ago
(In reply to comment #8)
> I agree, but the obvious fix is costly

broken record: if the code doesn't have to be correct, we can make it arbitrarily fast...

Comment 10

9 years ago
(In reply to comment #9)
> (In reply to comment #8)
> > I agree, but the obvious fix is costly
> 
> broken record: if the code doesn't have to be correct, we can make it
> arbitrarily fast...

Yes...

However, in this case the code is correct - from a certain point of view, which says that (a) write barrier performance is extremely important and (b) explicit write barriers using WB() and WB_NULL() don't normally occur at large offsets from the object start and hence we shouldn't slow down the system as a whole to support this edge case.

Updated

8 years ago
Depends on: 568977

Updated

8 years ago
Duplicate of this bug: 568977
(Assignee)

Comment 12

8 years ago
I just
Status: NEW → ASSIGNED
(Assignee)

Updated

8 years ago
Assignee: nobody → treilly
Priority: -- → P1
Target Milestone: Future → flash10.x - Serrano
(Assignee)

Comment 13

8 years ago
Created attachment 482862 [details] [diff] [review]
Fixes the bug using value at address or value but not address

If this flies there will be follow opportunities to clean up places in the core we work around this bug.
Attachment #482862 - Flags: superreview?(lhansen)
Attachment #482862 - Flags: review?(fklockii)
Its an interesting idea.  It relies on the write barrier code firing before the actual assignment takes place (as opposed to doing the assignment and then doing WB work afterward); but this is fine


You might adopt terms like "old_val" and "new_val" for comments and code rather than saying e.g. "thing at address" and "value"

Question: Can a null value contain distinct tags?  If so, do they matter?  Because you bomb out of the entire write if both the old_val and the new_val, once stripped of their tags, are null, but what if they are differently tagged null values?
(In reply to comment #14)
> Question: Can a null value contain distinct tags?  If so, do they matter? 
> Because you bomb out of the entire write if both the old_val and the new_val,
> once stripped of their tags, are null, but what if they are differently tagged
> null values?

even if this can come up, one potential "fix" is easy: just do the assignment itself before returning from GC::WriteBarrierRC.  (So its really a question of whether this can happen, or if it would instead be a sign of a bug somewhere else and thus we could instead assert that the tags, if present, match.)
(Assignee)

Comment 16

8 years ago
(In reply to comment #14)
> Its an interesting idea.  It relies on the write barrier code firing before the
> actual assignment takes place (as opposed to doing the assignment and then
> doing WB work afterward); but this is fine

Yeah, this only covers the substitutions barriers used from manual write barriers and the smart pointers.   If we got rid of manual write barriers we'd only have the smart pointer write barriers and the non-substitution barrier used by the JIT and we could probably remove some layers.

> You might adopt terms like "old_val" and "new_val" for comments and code rather
> than saying e.g. "thing at address" and "value"

Sure.
 
> Question: Can a null value contain distinct tags?  If so, do they matter? 
> Because you bomb out of the entire write if both the old_val and the new_val,
> once stripped of their tags, are null, but what if they are differently tagged
> null values?

The GC's policy has always been to ignore tags, there are at least 4 or 5 pointer tagging schemes used in MMgc client code.   You raise a good point that a write barrier could have been used to change a NULL tag and we should either allow that or assert when it happens.   I'll restructure the patch to allow it since I don't want to spend too much time on this now.
(Assignee)

Comment 17

8 years ago
Comment on attachment 482862 [details] [diff] [review]
Fixes the bug using value at address or value but not address

cancelling review, needs reworking due to Felix's comments and I also realized I broke my write barrier edge graph with this change!
Attachment #482862 - Flags: superreview?(lhansen)
Attachment #482862 - Flags: review?(fklockii)
(Assignee)

Comment 18

8 years ago
I'm going to still work on this as my hot item, I think fixing this will clean up a lot of stuff that will grease the skids for landing safegc (by enabling manual write barriers to go away in the player at least, we may still want to use manual wb's in the core for performance, ie to avoid find beginning and gc calculation).
(Assignee)

Comment 19

8 years ago
Created attachment 482986 [details] [diff] [review]
Take two, always have writes take place so corner cases like changing the ptr bits work
Attachment #482862 - Attachment is obsolete: true
Attachment #482986 - Flags: review?
(Assignee)

Updated

8 years ago
Attachment #482986 - Flags: superreview?(lhansen)
Attachment #482986 - Flags: review?(fklockii)
Attachment #482986 - Flags: review?

Comment 20

8 years ago
Just to be clear - because this is a large change to very sensitive code: Is this change being driven by the SafeGC work?  Are there alternatives to this change?
Comment on attachment 482986 [details] [diff] [review]
Take two, always have writes take place so corner cases like changing the ptr bits work

R- for 2 (fixable) reasons:

1. How can I tell, from the header documentation, what the difference is between WriteBarrierWriteRC and StaticInlineWriteBarrierRC?  (Note that both are static and inline).  Please revise text to distinguish them in some way.

2. In GC::WriteBarrierWriteRC: Where did the GCAssert(IsPointerIntoGCObject(address));" go?  I don't see it in the calling contexts.  Was it supposed to be in GC::WriteBarrierWrite?  Or has the assertion been invalidated by other aspects of your change (and if so, is that sane?)?

Additional feedback (things that I would like addressed but would not R- if they were left undone):

Your earlier patch was somewhat easier to digest.  The changes might be easier to validate if you first made a pure refactoring (no change to the semantics of how gc is extracted) that adds the new helpers like WriteBarrierSansWrite and the control-flow modifications, checked that in, and checked in the narrower change that introduces and uses GetGCFromAddressOrValue separately.

From looking at StaticInlineWriteBarrierRC's calls to WriteBarrierSansWrite and  WriteBarrierWriteRC, as well as the code for the latter two, there is RC barrier code executed by WriteBarrierWriteRC that is not covered by WriteBarrierSansWrite.  I found this decomposition a little surprising at first (as in, why is there barrier effort that is not included in WriteBarrierSansWrite).

The latter two points could probably be handled in tandem by devising appropriate terminology, in particular naming the two kinds of barrier work: barrier work that applies across all GCObjects, and additional barrier work that applies only to RCObjects.  (Then just say e.g. WriteBarrierWriteRC assumes that the GC-barrier-work has already fired and it is taking care of the RC-barrier-work.)
Attachment #482986 - Flags: review?(fklockii) → review-
(In reply to comment #21)
> The latter two points could probably be handled in tandem [...]

This sentence from the final paragraph did not make sense because I reordered my points but forgot to revise the last paragraph.

It should have been written as "The first point (1. above) and the last point (both about StaticInlineWriteBarrierRC and WriteBarrierWriteRC) could probably be handled in tandem [...]"
(In reply to comment #20)
> Just to be clear - because this is a large change to very sensitive code: Is
> this change being driven by the SafeGC work?  Are there alternatives to this
> change?

Lars: Are you worried about the fundamental technique (that is, using GetGC(*address) rather than GetGC(address) ?)  Or are you okay with that idea (which is working around GetGC's brittleness), and concerned about other aspects of this patch?

Comment 24

8 years ago
I'm concerned because the write barrier is performance-sensitive and this code appears (at first blush - I've not examined it carefully yet) to introduce more decisions into it, and that could translate into a slowdown.
(Assignee)

Comment 25

8 years ago
(In reply to comment #20)
> Just to be clear - because this is a large change to very sensitive code: Is
> this change being driven by the SafeGC work?  Are there alternatives to this
> change?

The alternatives are to keep using manual write barriers where we are to work around this issue and to keep ourselves open to additional pain/crashiness because of this.   Particularly I'm worried about the avmglue classes and someone adding a new field and things blowing up on 64 bit and not getting caught.
(Assignee)

Comment 26

8 years ago
(In reply to comment #24)
> I'm concerned because the write barrier is performance-sensitive and this code
> appears (at first blush - I've not examined it carefully yet) to introduce more
> decisions into it, and that could translate into a slowdown.

I'd planned to examine some release code to make sure all the inlining is working properly.   There's some changes here that makes some bits faster and some that might slow things down as well.  I'll get some perf #'s, do you know of any tests that are particularly WB perf sensitive?

Comment 27

8 years ago
v8.5/optimized/richards.abc hits WB's pretty hard.
(Assignee)

Comment 28

8 years ago
Comment on attachment 482986 [details] [diff] [review]
Take two, always have writes take place so corner cases like changing the ptr bits work

Cancelling review, will update with a) feedback from Felix b) performance data and c) some simple prose describing the problem, solution and approach.
Attachment #482986 - Attachment is obsolete: true
Attachment #482986 - Flags: superreview?(lhansen)
(Assignee)

Comment 29

8 years ago
Created attachment 484705 [details] [diff] [review]
Don't rely on GetGC

The big change here is that we use the incoming value in the static write barriers to get the GC, not the address.  This mean if the incoming value is NULL no write barrier happens.  But this is only for the smart pointer case, the non-static write barriers haven't changed at all.
Attachment #484705 - Flags: superreview?(lhansen)
Attachment #484705 - Flags: review?(fklockii)
(Assignee)

Comment 30

8 years ago
perf results are pretty much a wash as expected.

tommy-reillys-mac-pro:performance treilly$ ./runtests.py -i 5 --avm2 ~/builds/tamarin-redux/release/shell/avmshell --avm /tmp/avmshell v8.5/optimized/
Tamarin tests started: 2010-10-20 08:49:40.859647
Executing 9 test(s)
avm: /tmp/avmshell  version: cyclone
avm2: /Users/treilly/builds/tamarin-redux/release/shell/avmshell  version: cyclone
iterations: 5

                               avm             avm2          
test                           best     avg    best     avg   %dBst   %dAvg
Metric: v8 custom v8 normalized metric (hardcoded in the test)
Dir: v8.5/optimized/
  crypto                       3709    3704    3707  3698.6   -0.05   -0.15     
  deltablue                    2753  2744.6    2771    2765    0.65    0.74 +   
  earley-boyer                  920   914.2     915     911   -0.54   -0.35     
  raytrace                     6960    6952    6867    6862   -1.34   -1.29 -   
  regexp                       61.5    61.3      61    60.9   -0.81   -0.68 -   
  richards                     3424  3417.8    3451    3444    0.79    0.77 +   
  splay                        4894  4801.4    4908  4894.6    0.29    1.94     
tommy-reillys-mac-pro:performance treilly$ ./runtests.py -i 5 --avm2 ~/builds/tamarin-redux/release/shell/avmshell --avm /tmp/avmshell jsbench/typed
Tamarin tests started: 2010-10-20 08:51:53.644573
Executing 10 test(s)
avm: /tmp/avmshell  version: cyclone
avm2: /Users/treilly/builds/tamarin-redux/release/shell/avmshell  version: cyclone
iterations: 5

                               avm             avm2          
test                           best     avg    best     avg   %dBst   %dAvg
Metric: time 
Dir: jsbench/typed/
  Crypt                        1010    1011    1009  1009.8    0.10    0.12     
  Euler                       10160  10225.6    9995   10027    1.62    1.94 +   
  FFT                          2601  2603.8    2603  2611.4   -0.08   -0.29     
  HeapSort                     1676  1679.2    1677  1679.4   -0.06   -0.01     
  LUFact                       2730  2730.6    2700    2700    1.10    1.12 +   
  Moldyn                       4569    4581    4518  4533.4    1.12    1.04 +   
  RayTracer                    1486  1488.8    1492  1494.2   -0.40   -0.36 -   
  SOR                          8005  8010.2    8011  8016.2   -0.07   -0.07     
  Series                       8756  8774.4    8713  8735.4    0.49    0.44 +   
  SparseMatmult                2447  2451.8    2444  2450.2    0.12    0.07
Comment on attachment 484705 [details] [diff] [review]
Don't rely on GetGC

R-: I haven't gone too deep into it; because my first issue from my earlier R- was not addressed: "How can I tell, from the header documentation, what the difference is between WriteBarrierWriteRC and StaticInlineWriteBarrierRC?"

(If someone convinces me that I'm wrong about making a fuss about this, I'll revert back to R? and do the rest of the review.)

Let me elaborate, since I'm worried I did not state the issue clearly enough the first time around:

The doc for the 1st one says: "Do a WriteBarrierWrite and take care of reference count updating."
The doc for the 2nd one says: "Perform the actual store of value into *address, adjusting reference counts."
(note that the doc for WriteBarrierWrite says that it performs the actual store of value into *address, so that is not a difference between the two.  The actual difference is not mentioned at all in the docs.)

The actual difference between these two is that StaticInlineWriteBarrierRC is doing something extra in addition to what WriteBarrierWriteRC does.  But this is not indicated at all in the docs, so one cannot determine which one is the right one to invoke without looking at their implementations.

Since we already seem to be okay with describing behavior by referring to other functions, why not describe StaticInlineWriteBarrierRC as: "composition of WriteBarrierCheckedTrap and WriteBarrierWriteRC?"  True, its just barely abstracted from the code itself, but at least it provides information that distinguishes StaticInlineWriteBarrierRC from WriteBarrierWriteRC; a property the current docs cannot claim.

If the group decides that referring to other functions in the docs like that is too ugly, then I repeat my suggestion that we devise terminology to distinguish between RC and non-RC write-barrier work.

(Maybe the real issue here is that our current naming convention acts like "if the RC suffix is missing, that means it is nonRC stuff -- but this patch illustrates the problem with that convention: you cannot distinguish the composition of (RC + nonRC) material from RC material alone.)
Attachment #484705 - Flags: review?(fklockii) → review-
(Assignee)

Comment 32

8 years ago
Fair enough, if I'm gonna change this much I might as well clean up the docs.
(Assignee)

Updated

8 years ago
Attachment #484705 - Attachment is obsolete: true
Attachment #484705 - Flags: superreview?(lhansen)
(Assignee)

Comment 33

8 years ago
Created attachment 484752 [details] [diff] [review]
don't rely on GetGC

Sorry for the run-around Felix, hopefully this is moving in the write (yuck yuck) direction.
Attachment #484752 - Flags: review?(fklockii)
Comment on attachment 484752 [details] [diff] [review]
don't rely on GetGC

R-:

1. As I mentioned in comment 21, the following two assertions have been removed from WriteField (formerly known as WriteBarrierWrite):
    GCAssert(!IsRCObjectSafe(value));
    GCAssert(IsPointerIntoGCObject(address));
and I do not see the second assertion in the WriteField-calling contexts.  I do not understand why it has been removed from WriteField.  Can you please address this, either in the code or in this ticket?

2. The new doc above WriteBarrer{,RC,RC_ctor} says "The non-RC form must not be passed an RCObject value and an RCObject must not exist at address. Similarly the RC forms only accept RCObjects."  This makes it sound like you can only have reference arcs RC->RC and nonRC->nonRC, ie that there is no way to have an arc from an RC->nonRC or nonRC->RC; but that is an absurd conclusion.  The problem ( I think) is the last sentence in the doc; you can't just say "Similarly [...]" with such strong constraints from the preceding sentence (the last sentence refers to RCObjects (plural), which can easily be taken as a constraint on both value and *address).  Change the last sentence to say directly what the actual constraints are for *address and value, without reference to the previous sentence.

(Writing good docs is hard!)

3. WriteFieldRC's header doc refers to "WriteBarrierWrite"; replace with "WriteField".
Attachment #484752 - Flags: review?(fklockii) → review-
(Assignee)

Comment 35

8 years ago
I'll fix the assertion, nice catch.

>This makes it sound like you can only have reference arcs RC->RC and nonRC->nonRC, ie that there is 
> no way to have an arc from an RC->nonRC or nonRC->RC; but that is an absurd conclusion.  

I don't buy this, the comments say nothing about the object address points into.  I explicitly say "value" and "at address".    I'll try make it clearer.

> which can easily be taken as a constraint on both value and *address).

And it should, the contraints ARE on both value and *address.   Maybe that's why you're confused, hopefully fixing the missing asserts so they match the docs will make it clearer.
(In reply to comment #35)
> >This makes it sound like you can only have reference arcs RC->RC and nonRC->nonRC, ie that there is 
> > no way to have an arc from an RC->nonRC or nonRC->RC; but that is an absurd conclusion.  
> 
> I don't buy this, the comments say nothing about the object address points
> into.  I explicitly say "value" and "at address".    I'll try make it clearer.

Ugh, I fell victim to confusion between referrer and referent.

1. I grant, I might have been less confused if it had said "at *address" rather than "at address."

2. I suspect the docs as written are still wrong for the same reason that I gave above: they describe a constraint in the second-to-last that is too strong to apply to the last sentence.

Consider: you are saying in the second-to-last sentence that it is incorrect to use the non-RC form to overwrite an RC at *address.  Are you also trying to say in the last sentence that it is incorrect to use the RC form to overwrite a nonRC at *address?  This is bogus, yes?  (And that is why it is wrong to use "Similarly [...]"
(In reply to comment #36)
> in the last sentence that it is incorrect to use the RC form to overwrite a
> nonRC at *address?  This is bogus, yes?

Never mind, I jumped the gun: it is *not* bogus.  The constraint is entirely sensible.
Comment on attachment 484752 [details] [diff] [review]
don't rely on GetGC

Okay, so just to summarize: I retract my second objection from comment 34.  (I interpreted "an RCOjbect must not exist at address" as being a constraint on the object containing address, rather than a constraint on the object referred to by *address.)

Tommy already acknowledged the first point, and the third point is trivial to fix.  I'll switch to R+ now since I think I won't need to re-review once Tommy addresses points 1 and 3.
Attachment #484752 - Flags: review- → review+
Comment on attachment 484752 [details] [diff] [review]
don't rely on GetGC

retracting R+; I changed my mind again.  I want to re-review because I short-cut my last review and I want to make sure I give the next revision a proper review so that Lars can be confident during his SR.
Attachment #484752 - Flags: review+
(Assignee)

Comment 40

8 years ago
The plot thickens, doing full non-RC asserts if running a foul the manual WB in MultinameHashtable where a WB is used on value's that are "Binding"s.   I think a Binding can be a GC pointer or a abc offset.  Will have to get to the bottom of that.   The old code only asserted that value wasn't an rcobject so it slipped through the cracks.
(Assignee)

Comment 41

8 years ago
WE blocker: #2744511
(Assignee)

Updated

8 years ago
Depends on: 607340
(Assignee)

Updated

8 years ago
Blocks: 565664
(Assignee)

Updated

8 years ago
Depends on: 606236
(Assignee)

Comment 42

8 years ago
Note that this fix for MMgc::WriteBarrier and MMgc::WriteBarrierRC should be extended to avmplus::AtomWriteBarrier (ie it has the same problem by using GetGC)
(Assignee)

Updated

8 years ago
Blocks: 607651
(Assignee)

Updated

8 years ago
No longer blocks: 607651

Comment 43

8 years ago
As a fix for this bug would simplify the GC API and thus be part of the SafeGC goal, we should continue to look for solutions.  No activity for two weeks even if this is a "Major P1" bug.  Please update the bug ASAP with current status.
Blocks: 576307

Updated

8 years ago
No longer blocks: 576307
(Assignee)

Comment 44

8 years ago
Update: I pushed the blocker and have a patch ready but one of my new asserts fired on win64 so I need to investigate that first.
(Assignee)

Comment 45

8 years ago
msvc is adding 8 to pointer values in GCList<E4XNode>::add and subtracting 8 in get.   Only 64 bit (haven't tried release) and only for E4XNode apparently.   My new write barriers don't like this shenanigans and want a real pointer to the beginning of the object not a derived pointer.
(Assignee)

Updated

8 years ago
Depends on: 615964
(Assignee)

Updated

8 years ago
Depends on: 616125
(Assignee)

Comment 46

8 years ago
Created attachment 494810 [details] [diff] [review]
rebased passing sandbox w/ dependent fixes (some still pending review)
Attachment #484752 - Attachment is obsolete: true
Attachment #494810 - Flags: superreview?(lhansen)
Attachment #494810 - Flags: review?(fklockii)
(In reply to comment #46)
> Created attachment 494810 [details] [diff] [review]
> rebased passing sandbox w/ dependent fixes (some still pending review)

Question: This code originally used GetGC to extract the GC object by inspecting the data itself.  Your earlier solutions to the problem of reliable extraction replaced GetGC with GetGCFromAddressOrValue, so we would still attempt to extract the appropriate GC by inspecting the objects themselves (which the old code attempted to do but was passing in an offset address which was not valid for large objects).

Your new solution does not do this; instead it replaces GetGC with GetActiveGC, which extracts the GC from the execution context rather than inspecting the objects on the heap.  Is this guaranteed to behave in the same manner?  That is, will the two results be the same GC, and if that is not necessarily so, is there some invariant ensuring that the difference in GC does not matter?  (Pointers to relevant documentation welcome.)
(In reply to comment #47)
> That
> is, will the two results be the same GC, and if that is not necessarily so, is
> there some invariant ensuring that the difference in GC does not matter? 
> (Pointers to relevant documentation welcome.)

Quick follow-up: it does seem like it is just assertion code using the extracted GC, and the code is careful to use functions like IsRCObjectSafe.  But even those methods, while safe (in the sense that they will not crash or assert), are still using per-gc structure like the pagemap (used in FindBeginningGuarded).  So I am a little concerned that the difference in GC could matter, though it seems like it could "at most" introduce false positives and/or negatives in the assertions. . .
Comment on attachment 494810 [details] [diff] [review]
rebased passing sandbox w/ dependent fixes (some still pending review)

r- until i get answers for comment 47 and comment 48 (at which point I'll continue the review).
Attachment #494810 - Flags: review?(fklockii) → review-
(Assignee)

Comment 50

8 years ago
You answered the question pretty well I think.  We have two choices to reliably get the GC, extract it from the value, the value at address or the thread local.  The address itself  isn't reliable and with this patch we never attempt to get it from there.   We want to do assertion checking even if we can't get it from the value (b/c its NULL), hence we use GetActiveGC.   My earlier patches also attempted to get the GC from the value at address but I realized that if value was NULL we could skip the write barrier so I just made everything in that case static and used GetActiveGC to perform the debug checking.   The extra load of the value at address was unnecessary but more importantly affected performance noticably.
(In reply to comment #50)
> You answered the question pretty well I think.

I don't think I did.  I still don't know:

-- will the two results be the same GC

-- if they are not the same GC, can the assertions yield false passes (which is acceptable to me) or false failures (which is not acceptable).
(Assignee)

Comment 52

8 years ago
Yes they should always be the same if the client follows the rules.   If they aren't we'll assert/crash.

If they are not the same GC something went horribly wrong, basically it means a MMGC_GCENTER was missing or something whacky happened.    Some of the asserts will pass with a different GC but FindBeginning will assert and return NULL since it uses the pagemap, if you continue pass the assert we'll crash.
(In reply to comment #52)
> Yes they should always be the same if the client follows the rules.   If they
> aren't we'll assert/crash.

Hmm.  I'd go along with this reasoning, but I noticed that GC::GetGC at one point asserted that gc == GetActiveGC, but Steven commented out that assertion in TR changeset 2999:

  http://hg.mozilla.org/tamarin-redux/rev/1e3ad060a02a

So it sounds to me like the assumption is broken somewhere.
(Assignee)

Comment 54

8 years ago
That assert is valid we just need to fix the clients to not violate the rules in order to enable it ;-)
(Assignee)

Comment 55

8 years ago
Even if turns out  we need to allow GetGC to be used when the active GC isn't set, it would probably be wrong to allow a write barrier to fire when the active GC isn't set, if for no other reason simply to enable us to ensure that the write barriers are being used correctly.
(In reply to comment #54)
> That assert is valid we just need to fix the clients to not violate the rules
> in order to enable it ;-)

Filed as Bug 616477

That bug should not hold up review of this one (since this discussion is contained to assertion code and those should not hold up fixing the real underlying bug here).  I'll resume review.
Attachment #494810 - Flags: review- → review?(fklockii)
Comment on attachment 494810 [details] [diff] [review]
rebased passing sandbox w/ dependent fixes (some still pending review)

Looks good, now that I grok the basis for the switch to GetActiveGC().

Uber-nit: patch introduces extraneous trailing whitespace immediately above and within GC::InlineWriteBarrierCheckedTrap.
Attachment #494810 - Flags: review?(fklockii) → review+

Comment 58

8 years ago
InlineWriteBarrierGuardedTrap would be a better name for what you call InlineWriteBarrierCheckedTrap ("check" normally having the meaning of "throw an error if it doesn't hold").

Speaking of that method, its invariant is weaker than before in the case of RC storage (no tests on the low bits of container and pointer).  Not obvious that it matters.  Discuss.

I'm not ecstatic about the extra condition introduced into the barrier for the null test but it does seem like the only rational approach; calling GetActiveGC() on the hot path seems less desirable.

Updated

8 years ago
Attachment #494810 - Flags: superreview?(lhansen) → superreview+
(Assignee)

Comment 59

8 years ago
Player blocker Watson 2768575 fixed.  Will push this today
(Assignee)

Comment 60

8 years ago
(In reply to comment #58)
> InlineWriteBarrierGuardedTrap would be a better name for what you call
> InlineWriteBarrierCheckedTrap ("check" normally having the meaning of "throw an
> error if it doesn't hold").

Cool, will go with Guarded.

> Speaking of that method, its invariant is weaker than before in the case of RC
> storage (no tests on the low bits of container and pointer).  Not obvious that
> it matters.  Discuss.

I cleaned this up so that the address assertion moves to WriteField so all paths assert that now.  The container assert isn't necessary because IsPointerToGCObject will cover it, I've removed it.

> I'm not ecstatic about the extra condition introduced into the barrier for the
> null test but it does seem like the only rational approach; calling
> GetActiveGC() on the hot path seems less desirable.

Yeah, its a trade off but a good one I think.

Comment 61

8 years ago
changeset: 5634:187d01e5fc76
user:      Tommy Reilly <treilly@adobe.com>
summary:   Bug 525875 - static GC write barriers broken (GC::GetGC is brittle) (r=fklockii,sr=lhansen)

http://hg.mozilla.org/tamarin-redux/rev/187d01e5fc76
(Assignee)

Updated

8 years ago
Status: ASSIGNED → RESOLVED
Last Resolved: 9 years ago8 years ago
Resolution: --- → FIXED

Updated

7 years ago
Flags: flashplayer-bug+
You need to log in before you can comment on or make changes to this bug.