Closed Bug 1384688 Opened 7 years ago Closed 7 years ago

Include information about gaps

Categories

(Core :: Gecko Profiler, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla56
Tracking Status
firefox56 --- fixed

People

(Reporter: mstange, Assigned: mstange)

References

(Blocks 1 open bug)

Details

Attachments

(5 files)

The profile data from the different threads and different processes can occupy different time ranges. This can have a variety of reasons:
 - Processes were created and shut down at certain times
 - Threads inside those processes were created and shut down at certain times
 - The profile buffer was full; different processes can have different amounts of data in the buffer.

These things create gaps in the timeline if data from many threads and processes is viewed in the same timeline. Those gaps can be rather confusing.

I'd like to include enough information in the profile so that the reason for each of these gaps can be displayed in the UI.

For a given thread, the UI could look like this (in one line):

[--Process didn't exist--][--Thread didn't exist--][--Buffer overwritten--]
////data////[--Thread shut down--][--Process shut down--]

For ranges where we don't collect data because we're capturing a profile, we could have a [--Capturing profile--] marker. And for ranges where the profiler is paused, we could have [--Profiler paused--] marker.

Here's how I would expose this information in the profile JSON:

 - Thread startup + shutdown: Each thread would get two new properties,
   registerTime and unregisterTime, which will contain timestamps in milliseconds
   since process start. unregisterTime will be null if the thread is still alive.
 - Process startup + shutdown: The processProfile.meta object would get one new
   property, shutdownTime, which is the shutdown time in milliseconds since process
   start, or null if the process is still alive.
   meta.startTime already exists, and it's an absolute timestamp.
 - Paused ranges and profile collection:
   First I thought these could be markers, but markers are per-thread, whereas
   this is something that applies to the whole process. So I think we should add a
   list of pausedRanges to the profile, as a sibling property to the threads list.
   Or maybe a noSamplesReasons field:
   noSamplesReasons: [
     { startTime: ..., endTime: ..., reason: "paused" / "collecting" },
     ...
   ]
r+ on this approach!
The corresponding perf.html change is in https://github.com/devtools-html/perf.html/pull/487 .
> meta.startTime already exists, and it's an absolute timestamp.

I seem to recall a bug open about trying to get rid of this field because it's duplicated in the parent process and all child processes. But I could be mistaken.
Comment on attachment 8891118 [details]
Bug 1384688 - Add thread register and unregister time.

https://reviewboard.mozilla.org/r/162294/#review167622

::: tools/profiler/core/ThreadInfo.cpp:155
(Diff revision 1)
>                          int aThreadId,
>                          const ProfileBuffer& aBuffer,
>                          SpliceableJSONWriter& aWriter,
>                          const TimeStamp& aProcessStartTime,
> +                        const TimeStamp& aRegisterTime,
> +                        const TimeStamp& aUnregisterTime,

Oh dear, even more arguments to this function.

::: tools/profiler/core/platform.cpp:2946
(Diff revision 1)
>    int i;
>    ThreadInfo* info = FindLiveThreadInfo(lock, &i);
>    MOZ_RELEASE_ASSERT(info == TLSInfo::Info(lock));
>    if (info) {
>      DEBUG_LOG("profiler_unregister_thread: %s", info->Name());
> +    info->NotifyUnregistered();

Move this into the "then" branch. No point notifying a ThreadInfo just before deleting it.
Attachment #8891118 - Flags: review?(n.nethercote) → review+
(In reply to Nicholas Nethercote [:njn] from comment #8)
> > meta.startTime already exists, and it's an absolute timestamp.
> 
> I seem to recall a bug open about trying to get rid of this field because
> it's duplicated in the parent process and all child processes. But I could
> be mistaken.

Maybe bug 1362109?
Comment on attachment 8891119 [details]
Bug 1384688 - Add an aIsShuttingDown argument to profiler_get_profile() and to profiler_stream_json_for_this_process().

https://reviewboard.mozilla.org/r/162296/#review167628

::: tools/profiler/core/platform.cpp:1448
(Diff revision 1)
> +  // all other (non-"startTime") times anywhere in the profile JSON.
> +  if (aShutdownTime) {
> +    aWriter.DoubleProperty("shutdownTime",
> +      (aShutdownTime - CorePS::ProcessStartTime()).ToMilliseconds());
> +  } else {
> +    aWriter.NullProperty("shutdownTime");

Would it make more sense to just omit the property? Not sure.

::: tools/profiler/core/platform.cpp:2421
(Diff revision 1)
> +
> +  return b.WriteFunc()->CopyData();
> +}
> +
> +UniquePtr<char[]>
> +profiler_get_shutdown_profile()

The duplication between this function and profiler_get_profile() is awkward. I see two possible fixes:
- Remove profiler_get_shutdown_profile() and add an aIsShuttingDown argument to profiler_get_profile().
- Keep them separate, but introduce a helper function with an aIsShuttingDown argument. The two public functions would immediately call into the helper, but with different arguments.

::: tools/profiler/core/platform.cpp:2537
(Diff revision 1)
>    PR_SetEnv("MOZ_PROFILER_STARTUP_FILTERS=");
>  }
>  
>  static void
> -locked_profiler_save_profile_to_file(PSLockRef aLock, const char* aFilename)
> +locked_profiler_save_profile_to_file(PSLockRef aLock, const char* aFilename,
> +                                     bool aIsShuttingDown)

Give aIsShuttingDown a default value of false?

::: tools/profiler/public/GeckoProfiler.h:458
(Diff revision 1)
>  
>  // Get the profile encoded as a JSON string. A no-op (returning nullptr) if the
>  // profiler is inactive.
>  PROFILER_FUNC(
>    mozilla::UniquePtr<char[]> profiler_get_profile(double aSinceTime = 0),
>    nullptr)

Random question: Do we ever use the aSinceTime functionality?
Attachment #8891119 - Flags: review?(n.nethercote) → review+
Comment on attachment 8891120 [details]
Bug 1384688 - Pass aIsShuttingDown == true when obtaining shutdown profiles.

https://reviewboard.mozilla.org/r/162298/#review167630

::: tools/profiler/gecko/ProfilerChild.cpp:82
(Diff revision 1)
>  static nsCString
> -CollectProfileOrEmptyString()
> +CollectProfileOrEmptyString(bool aIsShuttingDown)
>  {
>    nsCString profileCString;
> -  UniquePtr<char[]> profile = profiler_get_profile();
> +  UniquePtr<char[]> profile =
> +    aIsShuttingDown ? profiler_get_shutdown_profile() : profiler_get_profile();

This will be simpler if profiler_get_profile() has an aIsShuttingDown argument.

::: tools/profiler/gecko/ProfilerChild.cpp:94
(Diff revision 1)
>  }
>  
>  mozilla::ipc::IPCResult
>  ProfilerChild::RecvGatherProfile(GatherProfileResolver&& aResolve)
>  {
> -  aResolve(CollectProfileOrEmptyString());
> +  aResolve(CollectProfileOrEmptyString(false));

A |/* isShuttingDown */ false| comment would be nice.

::: tools/profiler/gecko/ProfilerChild.cpp:115
(Diff revision 1)
>  }
>  
>  nsCString
>  ProfilerChild::GrabShutdownProfile()
>  {
> -  return CollectProfileOrEmptyString();
> +  return CollectProfileOrEmptyString(true);

Ditto
Attachment #8891120 - Flags: review?(n.nethercote) → review+
Comment on attachment 8891121 [details]
Bug 1384688 - Add a pausedRanges field to the profile JSON of each process which lists time ranges during which the profiler was paused or collecting a profile.

https://reviewboard.mozilla.org/r/162300/#review167634

::: tools/profiler/core/ProfileBufferEntry.h:46
(Diff revision 1)
>    macro(Time,                  double) \
> -  macro(UnsharedMemory,        double)
> +  macro(UnsharedMemory,        double) \
> +  macro(CollectionStart,       double) \
> +  macro(CollectionEnd,         double) \
> +  macro(Pause,                 double) \
> +  macro(Resume,                double)

These are currently in alphabetical order; please maintain that. Alternatively, you could reorder them to match the order in the grammar in ProfileBufferEntry.cpp.

::: tools/profiler/core/ProfileBufferEntry.cpp:594
(Diff revision 1)
>    const int mWritePos;
>    const int mEntrySize;
>  };
>  
>  // Each sample is made up of multiple ProfileBuffer entries. The following
>  // grammar shows legal sequences.

This description is now inaccurate. Please change it to something like this: 'The following grammar shows legal sequences of profile buffer entries. The sequences beginning with a ThreadId entry are known as "samples".'

::: tools/profiler/core/ProfileBufferEntry.cpp:725
(Diff revision 1)
>  
>    EntryGetter e(*this);
>    bool seenFirstSample = false;
>  
>    // This block skips entries until we find the start of the next sample. This
>    // is useful in two situations.

I realize this comment isn't quite right. Can you change the first sentence to this: "The

::: tools/profiler/core/ProfileBufferEntry.cpp:904
(Diff revision 1)
>  
>      WriteSample(aWriter, sample);
> +
> +    // Skip over Pause / Resume / CollectionStart / CollectionEnd.
> +    while (e.Has()) {
> +      const ProfileBufferEntry& entry = e.Get();

No need for the loop; just |goto skip_to_next_sample;|!

Once you do that, please update the "useful in two situations" comment above to say "three situations", and add something like "- We skip range Pause, Resume, CollectionStart, and CollectionEnd entries between samples."

(BTW, imagine how much fun it would have been to add these new entries if I hadn't rewritten this function :)

::: tools/profiler/core/ProfileBufferEntry.cpp:977
(Diff revision 1)
> +    if (e.Get().IsPause()) {
> +      currentPauseStartTime = Some(e.Get().u.mDouble);
> +    } else if (e.Get().IsResume()) {
> +      AddPausedRange(aWriter, "profiler-paused",
> +                     currentPauseStartTime, Some(e.Get().u.mDouble));
> +      currentPauseStartTime = Nothing();

Both this function and AddPausedRange() are awkward because the "reason" field is separate from the "startTime" and "endTime" fields.

I suggest making the JSON more similar to the ProfileBuffer format, with "pauseTime", "resumeTime", "collectionStartTime", "collectionEndTime" fields. Then you can greatly simplify this function and remove AddPausedRange() altogether. It'll also be consistent with the "registerTime" and "unregisterTime" fields used for threads.

::: tools/profiler/core/ProfileBufferEntry.cpp:990
(Diff revision 1)
> +    e.Next();
> +  }
> +
> +  if (currentPauseStartTime) {
> +    AddPausedRange(aWriter, "profiler-paused",
> +                    currentPauseStartTime, Nothing());

Incorrect indent.

::: tools/profiler/core/ProfileBufferEntry.cpp:994
(Diff revision 1)
> +    AddPausedRange(aWriter, "profiler-paused",
> +                    currentPauseStartTime, Nothing());
> +  }
> +  if (currentCollectionStartTime) {
> +    AddPausedRange(aWriter, "collecting",
> +                    currentCollectionStartTime, Nothing());

Ditto.

::: tools/profiler/core/platform.cpp:1670
(Diff revision 1)
> +    (TimeStamp::Now() - CorePS::ProcessStartTime()).ToMilliseconds();
> +
> +  // Record timestamps for the collection into the buffer, so that consumers
> +  // know why we didn't collect any samples for its duration.
> +  buffer.AddEntry(ProfileBufferEntry::CollectionStart(collectionStart));
> +  buffer.AddEntry(ProfileBufferEntry::CollectionEnd(collectionEnd));

I'm a bit uncertain here. We just finished streaming the buffer to JSON, and then we add new collection range entries to the buffer. Does that mean the range points won't be seen until the next time we stream to JSON?

This may be reasonable behaviour, but an explanatory comment would be helpful.

::: tools/profiler/core/platform.cpp:2929
(Diff revision 1)
>      if (!ActivePS::Exists(lock)) {
>        return;
>      }
>  
> +    double timeStamp =
> +      (TimeStamp::Now() - CorePS::ProcessStartTime()).ToMilliseconds();

This expression is so common it might be worth factoring out into its own function.
Attachment #8891121 - Flags: review?(n.nethercote) → review+
Comment on attachment 8891122 [details]
Bug 1384688 - Update profile version to 8.

https://reviewboard.mozilla.org/r/162302/#review167646

r=me, but the suggested JSON format changes in my other reviews will require tweaking the profiler extension.
Attachment #8891122 - Flags: review?(n.nethercote) → review+
BTW, I think this will make profile reading much nicer. I look forward to seeing it in action!
> Maybe bug 1362109?

That's the one. I see it's been WONTFIXED; carry on!
Comment on attachment 8891121 [details]
Bug 1384688 - Add a pausedRanges field to the profile JSON of each process which lists time ranges during which the profiler was paused or collecting a profile.

https://reviewboard.mozilla.org/r/162300/#review167634

> No need for the loop; just |goto skip_to_next_sample;|!
> 
> Once you do that, please update the "useful in two situations" comment above to say "three situations", and add something like "- We skip range Pause, Resume, CollectionStart, and CollectionEnd entries between samples."
> 
> (BTW, imagine how much fun it would have been to add these new entries if I hadn't rewritten this function :)

I even had that at some stage when writing this patch, but for some reason decided against it. I'll go back to it.

However, this means that the outer "while (e.Has())" loop effectively becomes an if statement, because there is no way to get a second iteration. So I'll change it to an if statement.

Now the whole function mostly becomes a loop that is implemented using goto. I then tried replacing the whole thing with a for(;;) loop. All the "goto skip_to_next_sample;" statements can then become "continue" statements... at least, they could, if only there wasn't the ERROR_AND_SKIP_TO_NEXT_SAMPLE macro which can't use continue in its implementation, because it needs to wrap its contents in a do { } while (0), and so "continue" will continue the wrong loop. I suppose I could give the outer loop a label, but at that point I might as well keep using goto.

> Both this function and AddPausedRange() are awkward because the "reason" field is separate from the "startTime" and "endTime" fields.
> 
> I suggest making the JSON more similar to the ProfileBuffer format, with "pauseTime", "resumeTime", "collectionStartTime", "collectionEndTime" fields. Then you can greatly simplify this function and remove AddPausedRange() altogether. It'll also be consistent with the "registerTime" and "unregisterTime" fields used for threads.

I don't understand this suggestion. Can you give me an example with multiple pauses?

> I'm a bit uncertain here. We just finished streaming the buffer to JSON, and then we add new collection range entries to the buffer. Does that mean the range points won't be seen until the next time we stream to JSON?
> 
> This may be reasonable behaviour, but an explanatory comment would be helpful.

Yes, that's the intention. I'll add a comment.

If you capture a profile, then from the perspective of the reported profile, time ended at the instant that you called getProfile. So the collection hasn't even started yet.

But if you collect the profile again, at some later point, then you want to know about the previous collection.
> Now the whole function mostly becomes a loop that is implemented using goto.
> I then tried replacing the whole thing with a for(;;) loop. All the "goto
> skip_to_next_sample;" statements can then become "continue" statements... at
> least, they could, if only there wasn't the ERROR_AND_SKIP_TO_NEXT_SAMPLE
> macro which can't use continue in its implementation, because it needs to
> wrap its contents in a do { } while (0), and so "continue" will continue the
> wrong loop. I suppose I could give the outer loop a label, but at that point
> I might as well keep using goto.

I'm fine if you want to change ERROR_AND_SKIP_TO_NEXT_SAMPLE so it's just encased in "{ }" instead of "do { } while (0)". In which case you should rename it to ERROR_AND_CONTINUE to make clear that it continues. Alternatively, move the continue out of the macro so it's not hidden.


> I don't understand this suggestion. Can you give me an example with multiple
> pauses?

Imagine you have this sequence of entries in the ProfileBuffer:

> Resume 125
> Pause 127
> Resume 128
> Pause 129

The current code will produce JSON like this:

> "endTime": 125,
> "reason": "profiler-paused",
>
> "startTime": 127,
> "endTime": 128,
> "reason": "profiler-paused",
>
> "startTime": 129,
> "reason": "profiler-paused",

I suggest making the JSON mirror the ProfileBuffer exactly:

> "resumeTime": 125,
> "pauseTime": 127,
> "resumeTime": 128,
> "pauseTime": 129,

That matches how the thread register/unregister fields work -- they don't have a "reason" field.

Now, you might still have to do pair matching in the profiler extension, but I think it might be more natural at that point.
(In reply to Nicholas Nethercote [:njn] from comment #18)
> > I don't understand this suggestion. Can you give me an example with multiple
> > pauses?
> 
> Imagine you have this sequence of entries in the ProfileBuffer:
> 
> > Resume 125
> > Pause 127
> > Resume 128
> > Pause 129
> 
> The current code will produce JSON like this:
> 
> > "endTime": 125,
> > "reason": "profiler-paused",
> > 
> > "startTime": 127,
> > "endTime": 128,
> > "reason": "profiler-paused",
> > 
> > "startTime": 129,
> > "reason": "profiler-paused",

It will produce this:

> [
>   { "startTime": null, "endTime": 125, "reason": "profiler-paused" },
>   { "startTime": 127, "endTime": 128, "reason": "profiler-paused" },
>   { "startTime": 129, "endTime": null, "reason": "profiler-paused" }
> ]

The array and the objects are important.

> I suggest making the JSON mirror the ProfileBuffer exactly:
> 
> > "resumeTime": 125,
> > "pauseTime": 127,
> > "resumeTime": 128,
> > "pauseTime": 129,

This doesn't work if all of these properties go in the same object; objects can't have duplicate keys.

I suppose it would work if you did this:

> [
>   { "resumeTime": 125 },
>   { "pauseTime": 127 },
>   { "resumeTime": 128 },
>   { "pauseTime": 129 }
> ]

but in my opinion this makes for a rather awkward format.

I'd like to keep the format as originally suggested.
(In reply to Nicholas Nethercote [:njn] from comment #18)
> That matches how the thread register/unregister fields work -- they don't
> have a "reason" field.

The reason we can do this for register/unregister is that those are unique - a given thread can't be registered or unregistered more than once.
Oh, one other thing I wanted to mention: The fact that I can just insert entries into the buffer during profile collection, and *know* that I'm free to do that because I'm currently holding the profiler state lock, is brilliant. I wouldn't be able to have this peace of mind if you hadn't done all the hard work to clarify the profiler threading and state management.
Comment on attachment 8891119 [details]
Bug 1384688 - Add an aIsShuttingDown argument to profiler_get_profile() and to profiler_stream_json_for_this_process().

https://reviewboard.mozilla.org/r/162296/#review167628

> Would it make more sense to just omit the property? Not sure.

I prefer having the property always.

> The duplication between this function and profiler_get_profile() is awkward. I see two possible fixes:
> - Remove profiler_get_shutdown_profile() and add an aIsShuttingDown argument to profiler_get_profile().
> - Keep them separate, but introduce a helper function with an aIsShuttingDown argument. The two public functions would immediately call into the helper, but with different arguments.

I went for the first option.

> Random question: Do we ever use the aSinceTime functionality?

The current devtools profiler uses it. It wants to show results during recording, so it periodically requests a profile and then merges the chunks in the frontend. When obtaining a profile chunk, it only wants data that's new since the last chunk.
Comment on attachment 8891121 [details]
Bug 1384688 - Add a pausedRanges field to the profile JSON of each process which lists time ranges during which the profiler was paused or collecting a profile.

https://reviewboard.mozilla.org/r/162300/#review167634

> These are currently in alphabetical order; please maintain that. Alternatively, you could reorder them to match the order in the grammar in ProfileBufferEntry.cpp.

I put them in alphabetical order.

> This expression is so common it might be worth factoring out into its own function.

One might even want to call this function profiler_time()... which already exists. (I was under the impression that you had removed it at some point!)
Pushed by mstange@themasta.com:
https://hg.mozilla.org/integration/autoland/rev/ed86b892433b
Add thread register and unregister time. r=njn
https://hg.mozilla.org/integration/autoland/rev/e9d92bc3be81
Add an aIsShuttingDown argument to profiler_get_profile() and to profiler_stream_json_for_this_process(). r=njn
https://hg.mozilla.org/integration/autoland/rev/3ad8f1ba9de6
Pass aIsShuttingDown == true when obtaining shutdown profiles. r=njn
https://hg.mozilla.org/integration/autoland/rev/399d40cb5c1c
Add a pausedRanges field to the profile JSON of each process which lists time ranges during which the profiler was paused or collecting a profile. r=njn
https://hg.mozilla.org/integration/autoland/rev/802c36bf1b27
Update profile version to 8. r=njn
Backed out for bustage at tools/profiler/gecko/nsProfiler.cpp:573: the address of 'thisProcessFirstSampleTime' will always evaluate as 'true':

https://hg.mozilla.org/integration/autoland/rev/6521422ef0b74f255f46544e8589b8f815ebd417
https://hg.mozilla.org/integration/autoland/rev/6806dd90d02e0788e5663982c29ccc33daa8fe86
https://hg.mozilla.org/integration/autoland/rev/1ab0d2a28f1b7b73c462d24d8b0d8fa3f393233f
https://hg.mozilla.org/integration/autoland/rev/40d2ccee4fa895a9b83db74259b8305ce072b99e
https://hg.mozilla.org/integration/autoland/rev/b32c85d2d03181b02a1243af8e39f2b98a164f60

Push with bustage: https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=802c36bf1b27ed53c8c396c6e081e3f0ab02b53f&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=runnable
Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=119652567&repo=autoland

[task 2017-07-31T18:07:13.261164Z] 18:07:13     INFO -  In file included from /home/worker/workspace/build/src/obj-firefox/tools/profiler/Unified_cpp_tools_profiler0.cpp:119:0:
[task 2017-07-31T18:07:13.261290Z] 18:07:13     INFO -  /home/worker/workspace/build/src/tools/profiler/gecko/nsProfiler.cpp: In member function 'RefPtr<mozilla::MozPromise<nsCString, nsresult, false> > nsProfiler::StartGathering(double)':
[task 2017-07-31T18:07:13.261372Z] 18:07:13     INFO -  /home/worker/workspace/build/src/tools/profiler/gecko/nsProfiler.cpp:573:73: error: the address of 'thisProcessFirstSampleTime' will always evaluate as 'true' [-Werror=address]
[task 2017-07-31T18:07:13.261413Z] 18:07:13     INFO -                                                &thisProcessFirstSampleTime)) {
[task 2017-07-31T18:07:13.261452Z] 18:07:13     INFO -                                                                           ^
[task 2017-07-31T18:07:13.261490Z] 18:07:13     INFO -  cc1plus: all warnings being treated as errors
Flags: needinfo?(mstange)
(In reply to Sebastian Hengst [:aryx][:archaeopteryx] (needinfo on intermittent or backout) from comment #35)
> /home/worker/workspace/build/src/tools/profiler/gecko/nsProfiler.cpp:573:73:
> error: the address of 'thisProcessFirstSampleTime' will always evaluate as
> 'true' [-Werror=address]

This is an excellent error and caught a real bug. I don't get this error locally.
Flags: needinfo?(mstange)
Pushed by mstange@themasta.com:
https://hg.mozilla.org/integration/autoland/rev/806ed8cd2c1b
Add thread register and unregister time. r=njn
https://hg.mozilla.org/integration/autoland/rev/39ae5bed4ee0
Add an aIsShuttingDown argument to profiler_get_profile() and to profiler_stream_json_for_this_process(). r=njn
https://hg.mozilla.org/integration/autoland/rev/ac7360e30fbb
Pass aIsShuttingDown == true when obtaining shutdown profiles. r=njn
https://hg.mozilla.org/integration/autoland/rev/b925ffe4c9cf
Add a pausedRanges field to the profile JSON of each process which lists time ranges during which the profiler was paused or collecting a profile. r=njn
https://hg.mozilla.org/integration/autoland/rev/c76dcba58c44
Update profile version to 8. r=njn
Depends on: 1431217
You need to log in before you can comment on or make changes to this bug.