Closed Bug 547066 Opened 14 years ago Closed 14 years ago

With Argo player, application loading time is very long while profiling compared to earlier players

Categories

(Tamarin Graveyard :: Virtual Machine, defect, P3)

Tracking

(Not tracked)

VERIFIED FIXED
flash10.1

People

(Reporter: mike, Assigned: mike)

Details

Attachments

(1 file, 1 obsolete file)

Tracking player bug 2547382, which is a bug in Tamarin.
Assignee: nobody → mmoreart
Status: NEW → ASSIGNED
Flags: flashplayer-qrb+
Priority: -- → P3
Target Milestone: --- → flash10.1
Attached patch Proposed patch (obsolete) — Splinter Review
Fix bug 547066: Performance problems in profiler: In profiler builds, MethodInfo::getMethodName() now caches the method name, because the same method names are requested over and over.
Attachment #427858 - Flags: superreview?(edwsmith)
Attachment #427858 - Flags: review?(treilly)
Comment on attachment 427858 [details] [diff] [review]
Proposed patch

delegating SR to stejohns who's more familiar with this code.  

I think this is the first time i've ever seen "mutable" in production code, neat.
Attachment #427858 - Flags: superreview?(edwsmith) → superreview?(stejohns)
I saw "mutable" the other day in StringObject.h -- that's what reminded me that it exists :-)
We allocate a bazillion MethodInfos -- it sucks that we have to re-add a field to it.

I'm tempted to say that we should only save the value ifdef AVMPLUS_SAMPLER *and* there's a Sampler already attached...
Good idea to only cache the name if there's a Sampler attached.  By the way, regarding memory consumption, I did do some tests, although there are more I could do.  I tested the memory consumption of a profile run with the cache, and a profile run without the cache (well actually, I left the field in the header, but I never set it); the difference was almost nonexistent (and the run with the cache was much much faster).

What I did not test are:
- The memory impact of the extra field in the header
- Memory changes in non-profile runs, e.g. a regular run with the old code vs. the new code

I can test those cases on Monday.

Another thing I wanted to mention: The reason this got slower is that SamplerScript::set_stack() now calls StackTrace::Element::name() (which in turns calls MethodInfo::getMethodName()) where it used to just access an already-created string.  The string manipulation and resulting GC's done in MethodInfo::getMethodNameWithTraits() are what ended up causing a huge slowdown.

Re: your suggestion of only caching the name if a Sampler is attached: I'm writing a new patch, but I'm not sure about something -- here is what I have so far for the last part of getMethodName(), where it saves the method name:

#ifdef AVMPLUS_SAMPLER
    Sampler* sampler = declaringTraits ? declaringTraits->core->get_sampler() : NULL;
    if (sampler && sampler->sampling())
        _methodName = methodName;
#endif

So I'm protecting against deref'ing a null pointer if declaringTraits being null, because it seems to me that it might be.  But if declaringTraits is null, then the "if" clause will always evaluate to false, and so I will not cache the method name.

So my questions are:

1. Is there a better way to get an AvmCore than by going through declaringTraits->core?

2. Is it rare for declaringTraits to be null?  If so, then maybe the above code is good enough.

3. If declaringTraits can be null often enough for us to worry about, Another way to write it would be, if declaringTraits is null, then rather than treating that as a case where I *don't* cache the method name, I treat it as a case where I *do* cache the method name:

#ifdef AVMPLUS_SAMPLER
    bool cache = false;
    if (declaringTraits)
    {
        Sampler* sampler = declaringTraits->core->get_sampler();
        if (sampler && sampler->sampling())
            cache = true;
    }
    else
    {
        // declaringTraits is null, so we don't know if there is a Sampler, so
        // go ahead and cache the method name
        cache = true;
    }

    if (cache)
        _methodName = methodName;
#endif
Attached patch Revised patchSplinter Review
Revised to only cache the method name if a sampler is attached *and* it is currently sampling.
Attachment #427858 - Attachment is obsolete: true
Attachment #427916 - Flags: superreview?(stejohns)
Attachment #427916 - Flags: review?(treilly)
Attachment #427858 - Flags: superreview?(stejohns)
Attachment #427858 - Flags: review?(treilly)
(In reply to comment #5)
> I can test those cases on Monday.

I'm not sure if more testing is warranted; short of having an out-of-band name cache, this is likely the best we can do. That said, this isn't too bad: cost for "normal" players is zero, cost for Debugger (aka Sampler) players is an extra unused pointer field, unless sampler is actually in use. In the long run moving this sort of stuff out-of-band (eg to a hashtable) would probably be nice, but that is too big a change for this point.
 
> 1. Is there a better way to get an AvmCore than by going through
> declaringTraits->core?

I don't think so.
 
> 2. Is it rare for declaringTraits to be null?  If so, then maybe the above code
> is good enough.

Yes: it should never be null by the time a MethodInfo is resolved. Checking for null is still a good idea. (You may also want to assert to this fact.)
Attachment #427916 - Flags: superreview?(stejohns) → superreview+
+1 on limiting to when sampler is active.  Tying this to the whether declaringTraits is NULL or not feels wrong.   Suggest using GCHeap::GetActiveGC()->GetAttachedSampler() instead.

We need to check that the allocation doesn't occur at a point that will trip up the sampler or skew profile results.  Ie the allocation can't occur when recording the sample, in the previous version the allocation was done early on purpose so no allocations happened during Sample recording.   The allocation should either happen really early (when creating the method info) or really late (when processing the samples to create the actual Sample objects).   Really late is preferred b/c usually sampling is paused when processing the samples and if we do the allocations then they won't be recorded.
Mutable is new to me, what's the rationale/effect of using it?
Note that if all MethodInfo's are const and we're concerned about the space issue here we could move this to the Sampler ie have a MethodInfo->name hashtable in the sampler.  This has a couple benefits, a) we don't waste space on Debugger targets that aren't sampling (ie AIR all the time), b) we can make it a more space effective cache by limiting its size or periodically dumping it.
mutable fields can be modified by a const method.

Steven thinks a side hashtable is too much scope creep for this bug; if that is the consensus, lets create a new bug for the side table.  In the long run, I do agree a sampler-specific cache (table) is better than hanging sampler-specific fields on MethodInfo.

Risks of a side table, all managable but should be noted:
 - excessive growth
 - stale entries pinning code
 - lookup cost
Interesting thoughts on mutable:

http://www.highprogrammer.com/alan/rants/mutable.html

Basically argues mutable shouldn't be used to defeat const but in only in true exceptions where the mutable field isn't a logical part of the object.   Seems like this usage qualifies.
I'm fine with submitting this fix as is caveat how we obtain Sampler * and then creating a new (post Argo) bug to create a MethodInfo/name cache.
(In reply to comment #8)
> We need to check that the allocation doesn't occur at a point that will trip up
> the sampler or skew profile results.  Ie the allocation can't occur when
> recording the sample, in the previous version the allocation was done early on
> purpose so no allocations happened during Sample recording.   The allocation
> should either happen really early (when creating the method info) or really
> late (when processing the samples to create the actual Sample objects).  
> Really late is preferred b/c usually sampling is paused when processing the
> samples and if we do the allocations then they won't be recorded.

Currently, it takes place late, when the profiler's ActionScript code starts iterating over the values returned by flash.sampler.getSamples().
(In reply to comment #12)
> Interesting thoughts on mutable:
> 
> http://www.highprogrammer.com/alan/rants/mutable.html
> 
> Basically argues mutable shouldn't be used to defeat const but in only in true
> exceptions where the mutable field isn't a logical part of the object.   Seems
> like this usage qualifies.

That blog is correct, but my usage does not violate that principle.  I am using "mutable" to create a cache that does not change the contract of the class, and leaves const objects as logically unchanged; that's the kind of thing "mutable" is supposed to be used for.  In my usage, the MethodInfo is logically constant (its method name is not going to change as a result of someone calling getMethodName()), but its internal representation needs to change (I need to cache the calculated name).

If someone expanded on my code and wrote a setMethodName() that set the value of the mutable _methodName field, that usage would be incorrect.
(In reply to comment #14)
> Currently, it takes place late, when the profiler's ActionScript code starts
> iterating over the values returned by flash.sampler.getSamples().

Great, as it should be.
(In reply to comment #15
> That blog is correct, but my usage does not violate that principle. ...

Agreed, when I said "Seems like this usage qualifies."  I meant it qualifies as a valid use of mutable.
(In reply to comment #11)
> Risks of a side table, all managable but should be noted:
>  - excessive growth
>  - stale entries pinning code
>  - lookup cost

Re "excessive growth", observe that we're already paying for O(worst case growth) with the current structure of the cache (in-line in every object).

IMO a side table will be a requirement but we can live with the current structure.

I haven't looked at the structure of the MethodInfo at present, but the cost of this extra field is either 0 bytes or 8 bytes, depending on how the rounding works out, and it could be more than 8 bytes, if the allocator crosses a size class boundary and the object is large enough not to hit the 8-spaced size classes (not terribly likely).
(In reply to comment #18)
> I haven't looked at the structure of the MethodInfo at present, but the cost of
> this extra field is either 0 bytes or 8 bytes, depending on how the rounding
> works out, and it could be more than 8 bytes, if the allocator crosses a size
> class boundary and the object is large enough not to hit the 8-spaced size
> classes (not terribly likely).

Good point I hadn't considered. It may be worth a quick experiment to see if a side-table approach is workable for Argo.
Back of the napkin shows sizeof(MethodInfo) == 44, we should check that this optimization isn't in fact free from a memory perspective (b/c the size class would be 48).
I just tested it.  Before my change:

    sizeof(MethodInfo) = 48
    GC::Size() = 48

After my change:

    sizeof(MethodInfo) = 52
    GC::Size() = 56
Great!  So now we know - 8 bytes for this pointer, on 32-bit platforms.
Attachment #427916 - Flags: review?(treilly) → review+
tamarin-redux-argo: changeset:   3733:f5ca573916d1
tamarin-redux: changeset:   3879:f5ca573916d1
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
QA Contact: vm → cpeyer
Flags: in-testsuite?
Verified fixed in Watson 2547382.
Status: RESOLVED → VERIFIED
Flags: in-testsuite? → in-testsuite-
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: