<a class="header-button" href="https://bugzilla.mozilla.org/home" title="Go to home page"> Bugzilla

Comment 4

•

10 years ago

We measure time in 4 different ways. This shouldn't theoretically matter, since we normalize everything to 0, but there's some subtleties that are not handled yet and can cause bugs. +-----------+ |Big picture| +-----------+ 1. Markers and memory consumption measurements ============================================== These times come from the timeline actor, namely the `now()` method in nsIDocShells. Every time markers and memory measurements are stored, they have the current time set via the docshell method. The recording's start time is also saved initially, also via the same method, by which all the time values will be normalized to 0 later on the frontend as soon as they are received (before displaying). 2. Profiler data ================ These times come from the profiler actor, each sample having a current time property, relative to when the profiler module was started. Since the profiler is continuously running, the profiler module itself may have been started before the current recording. The recording's start time is saved initially, via the `getElapsedTime()` method in nsIProfiler, by which all the time values will be normalized to 0 later on the frontend as soon as they are received (before displaying). 3. Framerate data ================= These values are recorded every time the refresh driver ticks, inside a requestAnimationFrame loop, via `performance.now()` from the content window. The recording's start time is saved initially, also via the same method, by which all the time values will be normalized to 0 immediately, on the actor. Therefore, the frontend receives them directly normalized. 4. Allocations data =================== These times are unix timestamps IIRC, in microseconds. The recording's start time is saved initially, via Date.now(), by which all the time values will be normalized to 0 later (after converting to milliseconds) on the frontend as soon as they are received (before displaying). To recap: * the markers and memory consumption times start from 0, normalized via the `nsIDocShells.now()` method. * the profiler samples start from 0 (or later if the circular buffer has overflown) normalized and sliced via the `nsIProfiler.getElapsedTime()` method. * the framerate times start from 0, normalized via the content's `performance.now()` method. * allocations data times start from 0, normalized via Date.now() after converting to milliseconds. +--------+ |Problems| +--------+ 1. Asynchronous start times =========================== Actors have different start times. Not all "zeros" after normalization are at the exact point in time, because there's some delay between starting each actor, both because of the protocol asyncness, and because it actually takes actual milliseconds between each startup procedure to finish. These deltas add up, and there's no mechanism in place to handle them at the moment. 2. Inconsistent accuracy ======================== Since we record time in 4 different way, each way might behave subtly differently or have internal problems that we may or may not know of. For example, Tom told me a while ago that PR_NOW is wildly inaccurate by even a few seconds on some platforms. Furthermore, I'm not sure how good Date.now() is in JavaScript land. Since we can't really record the time in exactly the same way for all the different actors, we need to investigate these issues more closely. tl;dr: Recording time is hard and we're doing it in 4 different ways at different points in time.

Flags: needinfo?(vporof)

Assignee

Updated

•

10 years ago

Summary: FPS graph is not properly synced with profiler data → Performance sources are not properly synced

Assignee

Updated

•

10 years ago

Depends on: 1159486, 1159506, 1159507

Assignee

Updated

•

10 years ago

Whiteboard: [devedition-40]

Assignee

Updated

•

10 years ago

Assignee: nobody → jsantell

Updated

•

10 years ago

Severity: normal → major

Assignee

Updated

•

10 years ago

Blocks: 1146237

Updated

•

10 years ago

No longer blocks: 1146237

Updated

•

10 years ago

Status: NEW → ASSIGNED

Comment 5

•

10 years ago

We discussed this in email a bit. The ideal solution would be to use the same epoch and resolution across all tools. Inside firefox there are 3 time sources: PR_Now (out of the question due to various issues, see bug 363258), PRMJ_Now, and TimeDuration. In timeline we've been using TimeDuration with TimeStamp::ProcessCreation as the epoch; and representing this to js as DOMHighResTimeStamp -- basically a "double" in units of milliseconds (but we preserve fractional bits so that the effective resolution is higher). TimeDuration is a bit of a painful choice because it isn't available in SpiderMonkey and it has a non-trivial representation. Another wrinkle is that entries in the profiler circular buffer are word-sized. Currently the profiler casts a TimeDuration's double to a float: https://dxr.mozilla.org/mozilla-central/source/tools/profiler/ProfileEntry.cpp?from=ProfileEntry.cpp#615 One more issue is overflow. A float has 24 mantissa bits, so it can represent 16777216. Considered as milliseconds, this is less than 5 hours. A consequence of this is that simply changing the profiler to use ProcessCreation would lose precision after firefox had been running for a while. One way out of all this would be to use PRMJ_Now everywhere, with caveats. As far as I can tell, PRMJ_Now doesn't suffer from the problems of PR_Now; it has acceptable resolution; it is available everywhere; and it has a consistent epoch. So, it meets all the criteria for a fix. For the profiler, could we use uint32_t rather than float? This would avoid growing the circular buffer and would also avoid any precision loss problems. 32 bits of microseconds is 4294 seconds. This would mean that the timestamps would wrap if a profile experiment was run for 72 minutes, but that seems acceptable to me. Here I'm assuming that we do want micro- rather than milli-second resolution. It seems cleanest to just accept the 32 bit restriction for all uses. This would minimize confusion across tools. I am not sure what to do about the framerate case. Using performance.now is a bit weird because it is reset when the page is reloaded, but it seems very useful to support profiling across page reloads. Some bit of js code will have to record the start time for use as an epoch, doing an unsigned difference using the recorded times, to avoid wrapping issues when the experiment is started near the point where the low 32 bits of PRMJ_Now wrap. Note that because PRMJ_Now has an internally consistent consistent epoch, it is tempting to just let each tool send back its own start time. However, this is a bit of a pain if one actor starts before the value wraps. It's better to have a single start time, since otherwise one must use a heuristic in this case.

Comment 6

•

10 years ago

For the profiler case, djvj told me that we can change the timestamps' data type without any penalties to the circular buffer or profile entries size. He's CCd here. For the framerate actor, while performance.now() is cool, ideally PRMJ_Now would be exposed to js land, where it can be queried. I don't want to us to use anything else other than the same time measurement system everywhere, and this includes framerate. I also don't want us to deal with multiple start times. There should be a single epoch, and a single time we query a start time (that is, when a recording starts). This will only be used to make the UI pretty, not the data "correct".

Assignee

Comment 7

•

10 years ago

Great write up, Tom. Consistent epoch, meaning if the profiler starts it, at 0, a second later the markers start (at 1 000 000 in microseconds)? Lagging start times is fine as long as X from the profiler is the same moment as X from another source. I think overflow is fine; in the event the profiler does not stop, we'll still have some start time that can handle adding MAX_TIME to values less than STARTTIME for normalization. And yes, is this accessible to JS land?

Comment 8

•

10 years ago

(In reply to Jordan Santell [:jsantell] [@jsantell] from comment #7) > Great write up, Tom. > > Consistent epoch, meaning if the profiler starts it, at 0, a second later > the markers start (at 1 000 000 in microseconds)? Lagging start times is > fine as long as X from the profiler is the same moment as X from another > source. A consistent epoch just means that the times you get can be directly compared. Having different epochs can be ok if you can find the epoch of each time source so you can offset the various times to be on the same scale. But right now there's no way to, e.g., get sStartTime, which is what the profiler uses as an epoch. The gross part here is that if we wind up using uint32_t, then due to the wrapping business we'll need a bit of processing before the times can be compared. However, it's sufficient to get a single start time on the server and use that -- you won't need a separate start time per actor or anything like that. > And yes, is this accessible to JS land? If everything switches to the same time source then nsIDocShell.now() will be fine.

Comment 9

•

10 years ago

(In reply to Tom Tromey :tromey from comment #5) > In timeline we've been using TimeDuration with TimeStamp::ProcessCreation > as the epoch; and representing this to js as DOMHighResTimeStamp -- > basically a "double" in units of milliseconds (but we preserve fractional > bits so that the effective resolution is higher). I realized later that this confuses two different ideas. One concept is the units. This is basically what "1" means. DOMHighResTimeStamp has units of milliseconds. The other concept is the resolution -- how accurate the value is. Here I think the values are accurate at least to microseconds.

Shu-yu Guo [:shu]

Comment 10

•

10 years ago

(In reply to Victor Porof [:vporof][:vp] from comment #6) > For the profiler case, djvj told me that we can change the timestamps' data > type without any penalties to the circular buffer or profile entries size. > He's CCd here. That's not true, right now the 't' tag in the circular buffer is a float and is unioned with pointer types. If we change it to double, or int64 or whatever, it'll double the size of circular buffer entries on 32bit.

Shu-yu Guo [:shu]

Comment 11

•

10 years ago

(In reply to Shu-yu Guo [:shu] from comment #10) > (In reply to Victor Porof [:vporof][:vp] from comment #6) > > For the profiler case, djvj told me that we can change the timestamps' data > > type without any penalties to the circular buffer or profile entries size. > > He's CCd here. > > That's not true, right now the 't' tag in the circular buffer is a float and > is unioned with pointer types. If we change it to double, or int64 or > whatever, it'll double the size of circular buffer entries on 32bit. Oh, nevermind, I see Tom is proposing 32bit Unix epoch in ms.

Shu-yu Guo [:shu]

Comment 12

•

10 years ago

Isn't PRMJ_Now SpiderMonkey internal?

Comment 13

•

10 years ago

(In reply to Shu-yu Guo [:shu] from comment #12) > Isn't PRMJ_Now SpiderMonkey internal? It is exported as JS_Now.

Assignee

Comment 14

•

10 years ago

So it sounds like the PRMJ_Now stuff will work, except we don't have a plan for getting that in the profiler -- it sounded like there were two options: 1) expose PRMJ_Now to Gecko 2) Use another kind of timestamp that'll work with 32bit and not inflate the buffer size? Tom, Shu, what can we do for this? This is the last piece of the puzzle to get started on all the syncing.

Flags: needinfo?(ttromey)

Flags: needinfo?(shu)

Comment 15

•

10 years ago

(In reply to Jordan Santell [:jsantell] [@jsantell] from comment #14) > So it sounds like the PRMJ_Now stuff will work, except we don't have a plan > for getting that in the profiler -- it sounded like there were two options: > > 1) expose PRMJ_Now to Gecko This isn't an issue, PRMJ_Now is exposed as JS_Now. > 2) Use another kind of timestamp that'll work with 32bit and not inflate the > buffer size? > > Tom, Shu, what can we do for this? This is the last piece of the puzzle to > get started on all the syncing. There are two choices for the profiler. One is to increase the size of the profile elements to 64 bits on 32 bit machines. I am not sure if this is possible, Shu should say. The other option is to truncate the 64 bit times to 32 bits. To make this work sanely, what we'll need is a way to get the start time from the server -- that is, a server time in this format that comes before any of the tools have been started. This is needed to deal with the case where the experiment is started near the time where the 32 bit value wraps.

Flags: needinfo?(ttromey)

Comment 16

•

10 years ago

(In reply to Tom Tromey :tromey from comment #15) > that is, a server time in this format that comes before any of the tools > have been started. Sorry, not tools. What I mean is that the user presses the "start" button on the profile tool. This causes some things to happen on the server -- enabling the timeline in the docshell, etc. What we need is a server-generated time corresponding roughly to this button press, that is guaranteed to happen before things like turning on the timeline.

Comment 17

•

10 years ago

We discussed this on irc and agreed to press forward with the plan of using PRMJ_Now (aka JS_Now) everywhere, and changing the profiler to use a 64-bit time type, even on 32 bit machines.

Comment 18

•

10 years ago

(In reply to Tom Tromey :tromey from comment #17) > We discussed this on irc and agreed to press forward with the plan of > using PRMJ_Now (aka JS_Now) everywhere, and changing the profiler to use > a 64-bit time type, even on 32 bit machines. After attempting to convert the profiler in bug 1159486, I think we have to rethink this plan. The profiler just touches too many places, including getting timestamps from other general platform facilities that aren't directly profiler-related, e.g., VsyncDispatcher. So, I think we should pursue a two-part plan. First, use the same epoch for all things (process creation time) and change the allocation times to use some sort of hack to work around the lack of TimeStamp in SpiderMonkey. Second, done separately to avoid schedule crunch, try to move TimeStamp to mfbt and remove any hacks done for part 1.

Assignee

Updated

•

10 years ago

Blocks: 1164344

Assignee

Updated

•

10 years ago

Blocks: 1165151

Assignee

Updated

•

10 years ago

Flags: needinfo?(shu)

Assignee

Comment 19

•

10 years ago

Attached patch 1152829-sync-epoch.patch — Details — Splinter Review

First patch that will sync all the time sources once all epochs are common throughout sources. Will add tests once the underlying dependencies are good to go

Assignee

Updated

•

10 years ago

Blocks: perf-allocations

Assignee

Updated

•

10 years ago

Blocks: 1172180

Assignee

Updated

•

10 years ago

No longer blocks: 1172180

Depends on: 1172180

Jeff Griffiths (:canuckistani) (:⚡︎)

Assignee

Updated

•

10 years ago

Blocks: perf-tools-fx42

Updated

•

10 years ago

Whiteboard: [devedition-40] → [polish-backlog]

Assignee

Comment 20

•

10 years ago

Trying out the patch in 1172180 with this in context -- seems that the markers and profiler data are in sync. Framerate data doesn't seem to line up. http://i.imgur.com/cdSYnvN.png

Assignee

Updated

•

10 years ago

No longer blocks: perf-tools-fx42

Assignee

Updated

•

10 years ago

Blocks: perf-tools-fx43