Closed Bug 1152829 Opened 6 years ago Closed 5 years ago

Performance sources are not properly synced

Categories

(DevTools :: Performance Tools (Profiler/Timeline), defect, P1)

x86_64
Windows NT
defect

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: paul, Assigned: jsantell)

References

(Blocks 1 open bug)

Details

(Whiteboard: [polish-backlog])

Attachments

(3 files)

Attached image screenshot
See the attached screenshot for example. The FPS drop happens before any activity in the flame chart.
Attached image 30ms delay
Happens on my MBP as well.

Timeline data and FPS look aligned. But profiler data are "late", off by ~30ms.
Summary: I'm not sure the FPS graph is properly synced with profiler data → FPS graph is not properly synced with profiler data
Priority: -- → P1
fitzgen noticed that the flamegraph is off as well with the waterfall. Based from the data sources, waterfall/timeline have one set of times, and the profiler/flamegraph another. It's pretty obvious especially when selecting just one JS function call marker in the waterfall, and the flame graph shape does not correlate with it very well.

Victor, you did a lot of the timing stuff, pinging you here 'cause it's pretty important
Flags: needinfo?(vporof)
We measure time in 4 different ways. This shouldn't theoretically matter, since we normalize everything to 0, but there's some subtleties that are not handled yet and can cause bugs.

+-----------+
|Big picture|
+-----------+

1. Markers and memory consumption measurements
==============================================
  These times come from the timeline actor, namely the `now()` method in nsIDocShells. Every time markers and memory measurements are stored, they have the current time set via the docshell method. The recording's start time is also saved initially, also via the same method, by which all the time values will be normalized to 0 later on the frontend as soon as they are received (before displaying).

2. Profiler data
================
  These times come from the profiler actor, each sample having a current time property, relative to when the profiler module was started. Since the profiler is continuously running, the profiler module itself may have been started before the current recording. The recording's start time is saved initially, via the `getElapsedTime()` method in nsIProfiler, by which all the time values will be normalized to 0 later on the frontend as soon as they are received (before displaying).

3. Framerate data
=================
  These values are recorded every time the refresh driver ticks, inside a requestAnimationFrame loop, via `performance.now()` from the content window. The recording's start time is saved initially, also via the same method, by which all the time values will be normalized to 0 immediately, on the actor. Therefore, the frontend receives them directly normalized.

4. Allocations data
===================
  These times are unix timestamps IIRC, in microseconds. The recording's start time is saved initially, via Date.now(), by which all the time values will be normalized to 0 later (after converting to milliseconds) on the frontend as soon as they are received (before displaying).

To recap: 
* the markers and memory consumption times start from 0, normalized via the `nsIDocShells.now()` method.
* the profiler samples start from 0 (or later if the circular buffer has overflown) normalized and sliced via the `nsIProfiler.getElapsedTime()` method.
* the framerate times start from 0, normalized via the content's `performance.now()` method.
* allocations data times start from 0, normalized via Date.now() after converting to milliseconds.

+--------+
|Problems|
+--------+

1. Asynchronous start times
===========================
  Actors have different start times. Not all "zeros" after normalization are at the exact point in time, because there's some delay between starting each actor, both because of the protocol asyncness, and because it actually takes actual milliseconds between each startup procedure to finish. These deltas add up, and there's no mechanism in place to handle them at the moment.

2. Inconsistent accuracy
========================
  Since we record time in 4 different way, each way might behave subtly differently or have internal problems that we may or may not know of. For example, Tom told me a while ago that PR_NOW is wildly inaccurate by even a few seconds on some platforms. Furthermore, I'm not sure how good Date.now() is in JavaScript land. Since we can't really record the time in exactly the same way for all the different actors, we need to investigate these issues more closely.


tl;dr: Recording time is hard and we're doing it in 4 different ways at different points in time.
Flags: needinfo?(vporof)
Summary: FPS graph is not properly synced with profiler data → Performance sources are not properly synced
Whiteboard: [devedition-40]
Assignee: nobody → jsantell
We discussed this in email a bit.

The ideal solution would be to use the same epoch and resolution
across all tools.

Inside firefox there are 3 time sources: PR_Now (out of the question
due to various issues, see bug 363258), PRMJ_Now, and TimeDuration.

In timeline we've been using TimeDuration with TimeStamp::ProcessCreation
as the epoch; and representing this to js as DOMHighResTimeStamp --
basically a "double" in units of milliseconds (but we preserve fractional
bits so that the effective resolution is higher).

TimeDuration is a bit of a painful choice because it isn't available
in SpiderMonkey and it has a non-trivial representation.

Another wrinkle is that entries in the profiler circular buffer are
word-sized.  Currently the profiler casts a TimeDuration's double
to a float:

https://dxr.mozilla.org/mozilla-central/source/tools/profiler/ProfileEntry.cpp?from=ProfileEntry.cpp#615

One more issue is overflow.  A float has 24 mantissa bits, so it can
represent 16777216.  Considered as milliseconds, this is less than 5 hours.
A consequence of this is that simply changing the profiler to use
ProcessCreation would lose precision after firefox had been running for a while.


One way out of all this would be to use PRMJ_Now everywhere, with caveats.
As far as I can tell, PRMJ_Now doesn't suffer from the problems of PR_Now;
it has acceptable resolution; it is available everywhere; and it has a
consistent epoch.  So, it meets all the criteria for a fix.

For the profiler, could we use uint32_t rather than float?
This would avoid growing the circular buffer and would also avoid
any precision loss problems.

32 bits of microseconds is 4294 seconds.  This would mean that the timestamps
would wrap if a profile experiment was run for 72 minutes, but that
seems acceptable to me.  Here I'm assuming that we do want micro- rather than
milli-second resolution.

It seems cleanest to just accept the 32 bit restriction for all uses.
This would minimize confusion across tools.

I am not sure what to do about the framerate case.  Using performance.now
is a bit weird because it is reset when the page is reloaded, but it seems
very useful to support profiling across page reloads.


Some bit of js code will have to record the start time for use as an epoch,
doing an unsigned difference using the recorded times, to avoid wrapping
issues when the experiment is started near the point where the low 32 bits
of PRMJ_Now wrap.

Note that because PRMJ_Now has an internally consistent consistent epoch, it
is tempting to just let each tool send back its own start time.  However, this
is a bit of a pain if one actor starts before the value wraps.  It's better
to have a single start time, since otherwise one must use a heuristic in this case.
For the profiler case, djvj told me that we can change the timestamps' data type without any penalties to the circular buffer or profile entries size. He's CCd here.

For the framerate actor, while performance.now() is cool, ideally PRMJ_Now would be exposed to js land, where it can be queried. I don't want to us to use anything else other than the same time measurement system everywhere, and this includes framerate. 

I also don't want us to deal with multiple start times. There should be a single epoch, and a single time we query a start time (that is, when a recording starts). This will only be used to make the UI pretty, not the data "correct".
Great write up, Tom. 

Consistent epoch, meaning if the profiler starts it, at 0, a second later the markers start (at 1 000 000 in microseconds)? Lagging start times is fine as long as X from the profiler is the same moment as X from another source. 

I think overflow is fine; in the event the profiler does not stop, we'll still have some start time that can handle adding MAX_TIME to values less than STARTTIME for normalization.

And yes, is this accessible to JS land?
(In reply to Jordan Santell [:jsantell] [@jsantell] from comment #7)
> Great write up, Tom. 
> 
> Consistent epoch, meaning if the profiler starts it, at 0, a second later
> the markers start (at 1 000 000 in microseconds)? Lagging start times is
> fine as long as X from the profiler is the same moment as X from another
> source. 

A consistent epoch just means that the times you get can be directly compared.

Having different epochs can be ok if you can find the epoch of each time
source so you can offset the various times to be on the same scale.  But right
now there's no way to, e.g., get sStartTime, which is what the profiler uses
as an epoch.

The gross part here is that if we wind up using uint32_t, then due to the
wrapping business we'll need a bit of processing before the times can be
compared.  However, it's sufficient to get a single start time on the server
and use that -- you won't need a separate start time per actor or anything
like that.

> And yes, is this accessible to JS land?

If everything switches to the same time source then nsIDocShell.now() will
be fine.
(In reply to Tom Tromey :tromey from comment #5)

> In timeline we've been using TimeDuration with TimeStamp::ProcessCreation
> as the epoch; and representing this to js as DOMHighResTimeStamp --
> basically a "double" in units of milliseconds (but we preserve fractional
> bits so that the effective resolution is higher).

I realized later that this confuses two different ideas.
One concept is the units.  This is basically what "1" means.
DOMHighResTimeStamp has units of milliseconds.
The other concept is the resolution -- how accurate the value is.
Here I think the values are accurate at least to microseconds.
(In reply to Victor Porof [:vporof][:vp] from comment #6)
> For the profiler case, djvj told me that we can change the timestamps' data
> type without any penalties to the circular buffer or profile entries size.
> He's CCd here.

That's not true, right now the 't' tag in the circular buffer is a float and is unioned with pointer types. If we change it to double, or int64 or whatever, it'll double the size of circular buffer entries on 32bit.
(In reply to Shu-yu Guo [:shu] from comment #10)
> (In reply to Victor Porof [:vporof][:vp] from comment #6)
> > For the profiler case, djvj told me that we can change the timestamps' data
> > type without any penalties to the circular buffer or profile entries size.
> > He's CCd here.
> 
> That's not true, right now the 't' tag in the circular buffer is a float and
> is unioned with pointer types. If we change it to double, or int64 or
> whatever, it'll double the size of circular buffer entries on 32bit.

Oh, nevermind, I see Tom is proposing 32bit Unix epoch in ms.
Isn't PRMJ_Now SpiderMonkey internal?
(In reply to Shu-yu Guo [:shu] from comment #12)
> Isn't PRMJ_Now SpiderMonkey internal?

It is exported as JS_Now.
So it sounds like the PRMJ_Now stuff will work, except we don't have a plan for getting that in the profiler -- it sounded like there were two options:

1) expose PRMJ_Now to Gecko
2) Use another kind of timestamp that'll work with 32bit and not inflate the buffer size?

Tom, Shu, what can we do for this? This is the last piece of the puzzle to get started on all the syncing.
Flags: needinfo?(ttromey)
Flags: needinfo?(shu)
(In reply to Jordan Santell [:jsantell] [@jsantell] from comment #14)
> So it sounds like the PRMJ_Now stuff will work, except we don't have a plan
> for getting that in the profiler -- it sounded like there were two options:
> 
> 1) expose PRMJ_Now to Gecko

This isn't an issue, PRMJ_Now is exposed as JS_Now.

> 2) Use another kind of timestamp that'll work with 32bit and not inflate the
> buffer size?
> 
> Tom, Shu, what can we do for this? This is the last piece of the puzzle to
> get started on all the syncing.

There are two choices for the profiler.

One is to increase the size of the profile elements to 64 bits on 32 bit
machines.  I am not sure if this is possible, Shu should say.

The other option is to truncate the 64 bit times to 32 bits.  To make this
work sanely, what we'll need is a way to get the start time from the server --
that is, a server time in this format that comes before any of the tools
have been started.

This is needed to deal with the case where the experiment is started near
the time where the 32 bit value wraps.
Flags: needinfo?(ttromey)
(In reply to Tom Tromey :tromey from comment #15)

> that is, a server time in this format that comes before any of the tools
> have been started.

Sorry, not tools.
What I mean is that the user presses the "start" button on the profile tool.
This causes some things to happen on the server -- enabling the timeline
in the docshell, etc.
What we need is a server-generated time corresponding roughly to this button
press, that is guaranteed to happen before things like turning on the timeline.
We discussed this on irc and agreed to press forward with the plan of
using PRMJ_Now (aka JS_Now) everywhere, and changing the profiler to use
a 64-bit time type, even on 32 bit machines.
(In reply to Tom Tromey :tromey from comment #17)
> We discussed this on irc and agreed to press forward with the plan of
> using PRMJ_Now (aka JS_Now) everywhere, and changing the profiler to use
> a 64-bit time type, even on 32 bit machines.

After attempting to convert the profiler in bug 1159486, I think we have
to rethink this plan.  The profiler just touches too many places, including
getting timestamps from other general platform facilities that aren't
directly profiler-related, e.g., VsyncDispatcher.

So, I think we should pursue a two-part plan.

First, use the same epoch for all things (process creation time) and
change the allocation times to use some sort of hack to work around
the lack of TimeStamp in SpiderMonkey.

Second, done separately to avoid schedule crunch, try to move TimeStamp
to mfbt and remove any hacks done for part 1.
Flags: needinfo?(shu)
First patch that will sync all the time sources once all epochs are common throughout sources. Will add tests once the underlying dependencies are good to go
No longer blocks: 1172180
Depends on: 1172180
Whiteboard: [devedition-40] → [polish-backlog]
Trying out the patch in 1172180 with this in context -- seems that the markers and profiler data are in sync. Framerate data doesn't seem to line up.

http://i.imgur.com/cdSYnvN.png
No longer blocks: perf-tools-fx42
Paul, would you mind checking this out to see if things are sync'd for your use case?
Flags: needinfo?(paul)
It works! Awesome! Thanks a lot.
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Flags: needinfo?(paul)
Resolution: --- → WORKSFORME
Product: Firefox → DevTools
You need to log in before you can comment on or make changes to this bug.