Closed Bug 1152829 Opened 6 years ago Closed 5 years ago
Performance sources are not properly synced
See the attached screenshot for example. The FPS drop happens before any activity in the flame chart.
Happens on my MBP as well. Timeline data and FPS look aligned. But profiler data are "late", off by ~30ms.
Summary: I'm not sure the FPS graph is properly synced with profiler data → FPS graph is not properly synced with profiler data
fitzgen noticed that the flamegraph is off as well with the waterfall. Based from the data sources, waterfall/timeline have one set of times, and the profiler/flamegraph another. It's pretty obvious especially when selecting just one JS function call marker in the waterfall, and the flame graph shape does not correlate with it very well. Victor, you did a lot of the timing stuff, pinging you here 'cause it's pretty important
Summary: FPS graph is not properly synced with profiler data → Performance sources are not properly synced
Severity: normal → major
Status: NEW → ASSIGNED
We discussed this in email a bit. The ideal solution would be to use the same epoch and resolution across all tools. Inside firefox there are 3 time sources: PR_Now (out of the question due to various issues, see bug 363258), PRMJ_Now, and TimeDuration. In timeline we've been using TimeDuration with TimeStamp::ProcessCreation as the epoch; and representing this to js as DOMHighResTimeStamp -- basically a "double" in units of milliseconds (but we preserve fractional bits so that the effective resolution is higher). TimeDuration is a bit of a painful choice because it isn't available in SpiderMonkey and it has a non-trivial representation. Another wrinkle is that entries in the profiler circular buffer are word-sized. Currently the profiler casts a TimeDuration's double to a float: https://dxr.mozilla.org/mozilla-central/source/tools/profiler/ProfileEntry.cpp?from=ProfileEntry.cpp#615 One more issue is overflow. A float has 24 mantissa bits, so it can represent 16777216. Considered as milliseconds, this is less than 5 hours. A consequence of this is that simply changing the profiler to use ProcessCreation would lose precision after firefox had been running for a while. One way out of all this would be to use PRMJ_Now everywhere, with caveats. As far as I can tell, PRMJ_Now doesn't suffer from the problems of PR_Now; it has acceptable resolution; it is available everywhere; and it has a consistent epoch. So, it meets all the criteria for a fix. For the profiler, could we use uint32_t rather than float? This would avoid growing the circular buffer and would also avoid any precision loss problems. 32 bits of microseconds is 4294 seconds. This would mean that the timestamps would wrap if a profile experiment was run for 72 minutes, but that seems acceptable to me. Here I'm assuming that we do want micro- rather than milli-second resolution. It seems cleanest to just accept the 32 bit restriction for all uses. This would minimize confusion across tools. I am not sure what to do about the framerate case. Using performance.now is a bit weird because it is reset when the page is reloaded, but it seems very useful to support profiling across page reloads. Some bit of js code will have to record the start time for use as an epoch, doing an unsigned difference using the recorded times, to avoid wrapping issues when the experiment is started near the point where the low 32 bits of PRMJ_Now wrap. Note that because PRMJ_Now has an internally consistent consistent epoch, it is tempting to just let each tool send back its own start time. However, this is a bit of a pain if one actor starts before the value wraps. It's better to have a single start time, since otherwise one must use a heuristic in this case.
For the profiler case, djvj told me that we can change the timestamps' data type without any penalties to the circular buffer or profile entries size. He's CCd here. For the framerate actor, while performance.now() is cool, ideally PRMJ_Now would be exposed to js land, where it can be queried. I don't want to us to use anything else other than the same time measurement system everywhere, and this includes framerate. I also don't want us to deal with multiple start times. There should be a single epoch, and a single time we query a start time (that is, when a recording starts). This will only be used to make the UI pretty, not the data "correct".
Great write up, Tom. Consistent epoch, meaning if the profiler starts it, at 0, a second later the markers start (at 1 000 000 in microseconds)? Lagging start times is fine as long as X from the profiler is the same moment as X from another source. I think overflow is fine; in the event the profiler does not stop, we'll still have some start time that can handle adding MAX_TIME to values less than STARTTIME for normalization. And yes, is this accessible to JS land?
(In reply to Jordan Santell [:jsantell] [@jsantell] from comment #7) > Great write up, Tom. > > Consistent epoch, meaning if the profiler starts it, at 0, a second later > the markers start (at 1 000 000 in microseconds)? Lagging start times is > fine as long as X from the profiler is the same moment as X from another > source. A consistent epoch just means that the times you get can be directly compared. Having different epochs can be ok if you can find the epoch of each time source so you can offset the various times to be on the same scale. But right now there's no way to, e.g., get sStartTime, which is what the profiler uses as an epoch. The gross part here is that if we wind up using uint32_t, then due to the wrapping business we'll need a bit of processing before the times can be compared. However, it's sufficient to get a single start time on the server and use that -- you won't need a separate start time per actor or anything like that. > And yes, is this accessible to JS land? If everything switches to the same time source then nsIDocShell.now() will be fine.
(In reply to Tom Tromey :tromey from comment #5) > In timeline we've been using TimeDuration with TimeStamp::ProcessCreation > as the epoch; and representing this to js as DOMHighResTimeStamp -- > basically a "double" in units of milliseconds (but we preserve fractional > bits so that the effective resolution is higher). I realized later that this confuses two different ideas. One concept is the units. This is basically what "1" means. DOMHighResTimeStamp has units of milliseconds. The other concept is the resolution -- how accurate the value is. Here I think the values are accurate at least to microseconds.
(In reply to Victor Porof [:vporof][:vp] from comment #6) > For the profiler case, djvj told me that we can change the timestamps' data > type without any penalties to the circular buffer or profile entries size. > He's CCd here. That's not true, right now the 't' tag in the circular buffer is a float and is unioned with pointer types. If we change it to double, or int64 or whatever, it'll double the size of circular buffer entries on 32bit.
(In reply to Shu-yu Guo [:shu] from comment #10) > (In reply to Victor Porof [:vporof][:vp] from comment #6) > > For the profiler case, djvj told me that we can change the timestamps' data > > type without any penalties to the circular buffer or profile entries size. > > He's CCd here. > > That's not true, right now the 't' tag in the circular buffer is a float and > is unioned with pointer types. If we change it to double, or int64 or > whatever, it'll double the size of circular buffer entries on 32bit. Oh, nevermind, I see Tom is proposing 32bit Unix epoch in ms.
Isn't PRMJ_Now SpiderMonkey internal?
(In reply to Shu-yu Guo [:shu] from comment #12) > Isn't PRMJ_Now SpiderMonkey internal? It is exported as JS_Now.
So it sounds like the PRMJ_Now stuff will work, except we don't have a plan for getting that in the profiler -- it sounded like there were two options: 1) expose PRMJ_Now to Gecko 2) Use another kind of timestamp that'll work with 32bit and not inflate the buffer size? Tom, Shu, what can we do for this? This is the last piece of the puzzle to get started on all the syncing.
(In reply to Jordan Santell [:jsantell] [@jsantell] from comment #14) > So it sounds like the PRMJ_Now stuff will work, except we don't have a plan > for getting that in the profiler -- it sounded like there were two options: > > 1) expose PRMJ_Now to Gecko This isn't an issue, PRMJ_Now is exposed as JS_Now. > 2) Use another kind of timestamp that'll work with 32bit and not inflate the > buffer size? > > Tom, Shu, what can we do for this? This is the last piece of the puzzle to > get started on all the syncing. There are two choices for the profiler. One is to increase the size of the profile elements to 64 bits on 32 bit machines. I am not sure if this is possible, Shu should say. The other option is to truncate the 64 bit times to 32 bits. To make this work sanely, what we'll need is a way to get the start time from the server -- that is, a server time in this format that comes before any of the tools have been started. This is needed to deal with the case where the experiment is started near the time where the 32 bit value wraps.
(In reply to Tom Tromey :tromey from comment #15) > that is, a server time in this format that comes before any of the tools > have been started. Sorry, not tools. What I mean is that the user presses the "start" button on the profile tool. This causes some things to happen on the server -- enabling the timeline in the docshell, etc. What we need is a server-generated time corresponding roughly to this button press, that is guaranteed to happen before things like turning on the timeline.
We discussed this on irc and agreed to press forward with the plan of using PRMJ_Now (aka JS_Now) everywhere, and changing the profiler to use a 64-bit time type, even on 32 bit machines.
(In reply to Tom Tromey :tromey from comment #17) > We discussed this on irc and agreed to press forward with the plan of > using PRMJ_Now (aka JS_Now) everywhere, and changing the profiler to use > a 64-bit time type, even on 32 bit machines. After attempting to convert the profiler in bug 1159486, I think we have to rethink this plan. The profiler just touches too many places, including getting timestamps from other general platform facilities that aren't directly profiler-related, e.g., VsyncDispatcher. So, I think we should pursue a two-part plan. First, use the same epoch for all things (process creation time) and change the allocation times to use some sort of hack to work around the lack of TimeStamp in SpiderMonkey. Second, done separately to avoid schedule crunch, try to move TimeStamp to mfbt and remove any hacks done for part 1.
First patch that will sync all the time sources once all epochs are common throughout sources. Will add tests once the underlying dependencies are good to go
Whiteboard: [devedition-40] → [polish-backlog]
Trying out the patch in 1172180 with this in context -- seems that the markers and profiler data are in sync. Framerate data doesn't seem to line up. http://i.imgur.com/cdSYnvN.png
No longer blocks: perf-tools-fx42
Paul, would you mind checking this out to see if things are sync'd for your use case?
It works! Awesome! Thanks a lot.
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.