Closed Bug 555031 Opened 15 years ago Closed 15 years ago

Profiler timing results have significant errors around slow functions and idle time

Categories

(Tamarin Graveyard :: Virtual Machine, defect, P3)

Tracking

(Not tracked)

VERIFIED FIXED
flash10.1

People

(Reporter: mike, Assigned: mike)

References

Details

Attachments

(2 files)

The profiler's results have significant errors due to several interrelated bugs (the fixes are intertwined and can't be untangled from each other, so I am logging these together): 1. When exiting any ActionScript function (including [native] ones), we first pop the function off the ActionScript callstack, and then call Sampler::checkSample(). This is the wrong order. For example, suppose an AIR app opens a huge local file and then reads it with FileStream.readBytes(), and suppose that call take 5 milliseconds. Obviously the sample should show FileStream.readBytes() at the top of the callstack. But because we first pop the stack and then call Sampler::checkSample(), the sample won't show FileStream.readBytes() at all! Instead, the function that called FileStream.readBytes() will be at the top of the stack, so it will appear that the entire 5 milliseconds was spent in that function, and no time was spent in FileStream.readBytes(). 2. Similarly, when entering any ActionScript function, we first push the new function on the ActionScript callstack, and then call Sampler::checkSample() to let it take a sample if it wants to. This is the wrong order: The function you are leaving is the one that was running when the millisecond timer rolled over, so we should first call Sampler::checkSample(), and then push the new function on the callstack. 3. The profiler does not expose any concept of "idle" time, by which I mean time spent in the host (e.g. Flash Player) but not executing any ActionScript code. The reason this is a problem is that it leaves a profiler, such as Flash Builder's profiler, with no reliable way of determining how much time was spent in each function. There are only two possible ways for a profiler to allocate time based on the samples that we return: Either (a) assume each sample covers exactly 1 millisecond, or (b) assume each sample's duration is equal to its timestamp minus the preceding sample's timestamp. Approach (a) will often yield roughly accurate results, but in some cases it will be significantly wrong, such as in the FileStream.readBytes() example above. In general, any native function that takes more than a millisecond to finish will cause the numbers to be off. Although it isn't often that a native function takes more than a millisecond, we must not neglect that case, because the whole point of the profiler is to help find those slow functions. So this leaves approach (b) as the correct one. But the problem is that in the absence of idle events, the subtraction method will yield results that are off by a lot. For example, suppose the user starts a swf, then waits ten seconds, and then moves the mouse over the swf. During those ten seconds, if nothing at all happens in ActionScript, then the AS callstack is completely empty, so no profiler samples will be taken. Then, when the mouse-move event happens, a sample will be taken. But the time difference between this sample and the previous one is ten seconds, implying that ten seconds were spent in the mouse-move handler. This is clearly wrong. The solution is: Whenever pushing a function on the ActionScript callstack, if the callstack is completely empty and it's time to take a sample, then first push a fake "[idle]" event on the callstack and take a sample; then pop the "[idle]" event, push the real function call, and carry on. Must-fix for flash10.1. I have this fixed locally, will attach the patch soon.
Assignee: nobody → mmoreart
Status: NEW → ASSIGNED
Attached patch FixSplinter Review
Fix attached. I have tested this fix extensively with the Flash Builder profiler. Description of fix: - Created new fake function "[idle]". - Modified the various overloads of CallStackNode.init() so that before pushing, they first call a new function in AvmCore, sampleCheckBeforeFunctionEnter(), which is similar to AvmCore::sampleCheck() except that it also checks whether the stack is entirely empty, and if it is (and it's time for a sample), samples the "[idle]" event. - Changed the order of things in CallStackNode.reset(), which is called by ~CallStackNode(), so that we call AvmCore::sampleCheck() before popping the stack. - A strange quirk: It turns out that sometimes, the same CallStackNode is essentially popped off the stack twice!!! E.g. if ActionScript code calls a [native] function, then when it is exiting, first MethodInfo::debugEnterExitWrapper32() calls MethodEnv::debugExit(), which pops the stack with "core->callStack = callstack->next", and then, MethodInfo::debugEnterExitWrapper32() implicitly invokes ~CallStackNode, which does the same thing again! Luckily, the way it is done does not corrupt the stack. We should clean this up, but since it is harmless, I don't want to take that risk right now. So I had to add another AvmCore::sampleCheck() call to MethodEnv::debugExit(), before it pops the stack.
Attachment #435009 - Flags: superreview?(edwsmith)
Attachment #435009 - Flags: review?(treilly)
Something makes me nervious about creating a transient callstacknode (i.e. a fake AS3 call stack) just for the sake of generating an [idle] sample. Does the sampler ever stop sampling? If not then we are doing something every 1ms, would it be possible to detect that no AS3 is running, and create an [idle] sample at that point, instead of doing it lazily during the next AS3 sample? (or, in the future, have the sampler inspect a global state variable providing hints as to what the player is doing... sleeping, waiting on i/o, rendering, running as3). then if running as3, take an as3 sample with an as3 stack trace. otherwise take an "activity" sample with no as3 stack trace. > - A strange quirk: followup bug would be good.
> Something makes me nervious about creating a transient callstacknode (i.e. a > fake AS3 call stack) just for the sake of generating an [idle] sample. Not sure if you know this, but we already make extensive use of fake callstack nodes, and they serve us well -- they allow the profiler to easily expose time spent during a lot of different things that are not directly ActionScript code. Tamarin includes [mark], [sweep], [verify], and a number of others; the Flash Player adds more such as [render]. These provide a nice easy way to let the profiler tell the user about non-ActionScript stuff even though it is mostly an ActionScript-based profiler. Another alternative I considered, instead of pushing an [idle] event, was to create a sample that had an entirely empty callstack (Sample.stack would be a zero-length array). However, the current Tamarin never generates samples with an empty callstack, so this would be an API change. (The asdoc for Sample.stack does not address the issue of empty callstacks, but in practice, it currently never generates them.) I tried changing the code in this way, and Flash Builder's profiler couldn't handle it -- it assumes the stack is always non-empty. However, fixing Builder is easy. If you would feel more comfortable with an empty callstack (it isn't really clear to me whether that is your concern), we could do that, either by adding an API like flash.sample.allowIdleSamples(allow:Boolean) or flash.sample.allowEmptyCallstacks(allow:Boolean), or via some other versioning mechanism. Do we have a way to get some sort of version number associated with our caller? Personally I sort of like this idea -- allowing clients to opt-in to empty callstacks -- a little better than my current approach of pushing [idle] events. It's a little messy to have to add a function to the API for it though. Do you have an opinion on that? Does that address your concern, or not? > Does the sampler ever stop sampling? If not then we are doing something every > 1ms, would it be possible to detect that no AS3 is running, and create an > [idle] sample at that point, instead of doing it lazily during the next AS3 > sample? I don't think we can do this (certainly not in flash10.1). Samples are generated on the main thread, the same thread that is running ActionScript. A background thread is triggering a timer once per millisecond, but all that thread does is to set a flag; the main thread periodically checks the flag to see if it's time to take a sample. Since there is no way to control what the host app is doing when it isn't calling Tamarin, the only way to actually generate a timer event once per millisecond would be to have our background thread actually push the samples. But I don't really understand your concern -- it isn't clear to me how that would be better than generating idle events at function entry. To me, doing them at function entry is better: - By doing it lazily, we don't waste time doing unnecessary work - We create fewer idle samples (only one idle sample for a long period of time, instead of a bunch for however many milliseconds have passed) -- so we take less memory
> > - A strange quirk: > > followup bug would be good. Logged bug 555299
Catching up on this bug. Is what we really want here a way for FB to know the AS execution context boundaries? Like if we forced a sample at every execution enter/exit (including native exits/re-enters) we'd solve this right? So there would always be at least two samples for every "main" function and every native function (enter/exit). Implementation would easy/simple I think. If FB knows which the native function bounary samples are then we'd be all set I think. My only concern is silly native functions that are basically just getter/setters being called alot and generating too many samples but since the 1ms sample time is already too granular for todays systems more samples might improve the profiling experience.
Comment on attachment 435009 [details] [diff] [review] Fix There are other ways to fix this too. Tommy and Edwin, let's discuss this on the phone tomorrow. For now, I'm withdrawing the review and superreview requests, to get them out of your queues.
Attachment #435009 - Flags: superreview?(edwsmith)
Attachment #435009 - Flags: review?(treilly)
Flags: flashplayer-qrb+
Priority: -- → P3
Target Milestone: --- → flash10.1
Attached patch Alternate patchSplinter Review
Okay, here is an alternate patch that takes a different approach to fixing the accuracy problems. This is a smaller change in behavior than patch 435009. (I am not marking that patch as obsolete just yet, but it probably is.) As with the previous patch, this one changes the callstack code in StackTrace.cpp so that the profiler is given its chance to take a snapshot *before* we push/pop the callstack, rather than after. However, I removed the part about taking snapshots of idle time. The key change in this new patch is that Sampler::sample() may now create more than one sample -- e.g. if it sees that 5.5 milliseconds have elapsed, it will write 5 samples. So e.g. if the user calls a slow native function, e.g. calls AIR's FileStream.readBytes() to read a large file, there will be one timer tick per millisecond, so a profiler such as Flash Builder's will accurately account for the time spent in that function. (Today, because of this Tamarin bug, Flash Builder will never report more than one millisecond each time a native function is called.) As with the previous patch, I have tested this exhaustively with Flash Builder's profiler, and it generates much more accurate results than we have today. (Today's results can be wildly wrong in some cases.) Also, this new patch has the advantage that it does not require any changes in profilers such as FB: Each timer tick still represents 1 millisecond, which is what FB already assumes to be the case.
Attachment #437146 - Flags: superreview?(edwsmith)
Attachment #437146 - Flags: review?(treilly)
This may be an improvement but its not really accurate. The 5 millis could have been the OS putting another process on the CPU. Not that I think the patch is a bad idea, just saying. But this fix will make a smoother transition to when we have more accurate, higher resolution sampling.
I agree that those 5 millis could have been spent in another process, and in fact I did see that happen occasionally in my testing; but this is a common problem in many profilers (which is one reason why it's usually a good idea to make sure your computer isn't running other CPU-intensive processes while you are doing your profile run). The only way I can think of for any profiler to avoid that problem is to detect when the OS gives time slices to other processes. When I saw this in my testing, it skewed the results slightly, but not enough to be misleading. For example, I had one slow function that was called about 1000 times, and each call took an average of 1ms; it was interrupted one time by a context switch, which added something like 80ms.
Attachment #437146 - Flags: review?(treilly) → review-
Comment on attachment 437146 [details] [diff] [review] Alternate patch I'm fine with the moving sampleCheck's around to get better info. The repetition of samples to appease FB is nasty and it occurs to me we could do also do it in the profiler agent. Seems like that would be better. The sample method is supposed to be balls simple and fast and we're going in the wrong direction if we have division, looping and visit the call stack n times.
Comment on attachment 437146 [details] [diff] [review] Alternate patch nits with the patch: - indentation is wonky - filenames are showing up incorrectly as hex #s (now i see why, but it makes reviewing harder) as for Tommy's comments, i'll remove the SR? until they're addressed. my $.02 is it seems the main reason we're not doing something more efficeint or accurate is time pressure? at this point we need to either get this into argo and live with it, or defer and rework it (maybe add a duration field to the sample and update FB to read that field, or better, to fix this whole animal to work more like typical sampling profilers (each sample captures a stack trace, rather than setting a flag to cause an arbitrarily later stack trace to be taken).
Attachment #437146 - Flags: superreview?(edwsmith)
Comment on attachment 437146 [details] [diff] [review] Alternate patch approved on the condition that we fix this post Argo to remove this cruft, should all go away when we remove the background thread up the sampling frequency and have new sample types for when we cross as/C++ boundaries
Attachment #437146 - Flags: review- → review+
Attachment #437146 - Flags: superreview?(edwsmith)
Comment on attachment 437146 [details] [diff] [review] Alternate patch The indentation problems are due to the fact that I am trying to follow our convention of using spaces to indent, but I am modifying files that use tabs. Is there something else you would rather have me do? It's hard, because I make some of my edits in Xcode, some in Eclipse, and some in Vim; although each of those editors has somewhat flexible settings for controlling tabs vs. spaces, they are all slightly different from each other and have their own quirks. If the files were all-spaces, there wouldn't be any problems, but as it is, it is hard to make the diffs look good. Not sure what you mean about tabs for the filenames -- how should I have prepared the patch differently?
(In reply to comment #10) > (From update of attachment 437146 [details] [diff] [review]) > I'm fine with the moving sampleCheck's around to get better info. The > repetition of samples to appease FB is nasty and it occurs to me we could do > also do it in the profiler agent. Seems like that would be better. > > The sample method is supposed to be balls simple and fast and we're going in > the wrong direction if we have division, looping and visit the call stack n > times. This was not an attempt to "appease FB" -- the API, as it exists today, is being correctly used by FB, but currently there is no way for any client profiler to differentiate between time spent in a slow native function vs. idle time spent entirely outside of Tamarin. My first patch took one approach at addressing that; my second one takes a different approach. There are a variety of ways we can improve this in the future, but for flash10.1, this patch, I feel, does a good job of fixing the behavior: It fixes the stream of samples such that they still adhere to the old API, but now they accurately represent the time spent in various functions. (My first patch would have required a change on the client side, e.g. Flash Builder, by having it take the difference between consecutive samples.) This function is still fast, at least on the Mac. I did profiling tests, and there was no difference before & after my change. The looping does not introduce much overhead at all -- it now loops n times per n milliseconds, which averages out to 1 loop per millisecond, which is what it was always intended to do in the first place.
Pushed attachment 437146 [details] [diff] [review] to tamarin-redux-argo and tamarin-redux as changeset 039c8eb5d12b
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Forgot to mention: Edwin had to leave for the day, but said to go ahead and push the change, and he will do his superreview after the fact.
Comment on attachment 437146 [details] [diff] [review] Alternate patch (In reply to comment #13) > (From update of attachment 437146 [details] [diff] [review]) > The indentation problems are due to the fact that I am trying to follow our > convention of using spaces to indent, but I am modifying files that use tabs. > Is there something else you would rather have me do? It's hard, because I make > some of my edits in Xcode, some in Eclipse, and some in Vim; although each of > those editors has somewhat flexible settings for controlling tabs vs. spaces, > they are all slightly different from each other and have their own quirks. If > the files were all-spaces, there wouldn't be any problems, but as it is, it is > hard to make the diffs look good. > > Not sure what you mean about tabs for the filenames -- how should I have > prepared the patch differently? plain "hg diff" output produces diffs such that the "view diff" window in bugzilla displays the filenames correctly. I'm not sure what command you used. Some people have HG set to use "git" style patches, and I suspect (without investigating) that style might confuse bugzilla's diff viewer. Its not a showstopper of course, just makes reviews a bit harder.
Attachment #437146 - Flags: superreview?(edwsmith) → superreview+
(In reply to comment #17) > plain "hg diff" output produces diffs such that the "view diff" window in > bugzilla displays the filenames correctly. I'm not sure what command you used. > Some people have HG set to use "git" style patches, and I suspect (without > investigating) that style might confuse bugzilla's diff viewer. Its not a > showstopper of course, just makes reviews a bit harder. I also did not know what Ed's comment about hex #s for filenames was referring to, but now I see what it was (namely that bugzilla's diff view is missing the filenames). I use git style patches, but I think the patches I have uploaded are still compatible with bugzilla's diff view (at least from what I can tell from reviewing other patches I have uploaded). So I do not think that is the source of the problematic patches, or at least not on its own. It may be worthwhile to isolate the problem here (and document what is recommended practice for generating patches, and/or roughly what format we like them to be in, so that one could double-check them before uploading to mozilla...) Is it hard to set up a separate isoloated bugzilla instance for such an investigation?
Ah, I see. I used "hg export" instead of "hg diff", and that is the problem. I'll use "hg diff" from now on.
I think hg export produces diffs as if you typed "hg diff -rREV1 -rREV2", so in the patch file, REV2 shows up where the filename ordinarily would show up, and bugzilla gets confused.
FYI, if all the changes are isolated to one changeset, then I think "hg log -r REV2 --patch" also produces acceptable results for bugzilla. (I also assume that the patch files used internally to Mercurial Queues are also acceptable files for bugzilla patch attachments; I have only started to experiment with [mq] so "check yourself before you wreck yourself.")
QA Contact: vm → cpeyer
Flags: in-testsuite?
Verified fixed. According to Mike Morearty testing profiler performance requires more than just avm (e.g Flash Builders ProfilerAgent.swf)
Status: RESOLVED → VERIFIED
Flags: in-testsuite? → in-testsuite-
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: