Closed Bug 1159639 Opened 9 years ago Closed 6 years ago

Implement API to add performance marker

Categories

(Firefox OS Graveyard :: Performance, defect, P1)

ARM
Gonk (Firefox OS)
defect

Tracking

(tracking-b2g:+)

RESOLVED WONTFIX
tracking-b2g +

People

(Reporter: jhao, Unassigned)

References

Details

(Keywords: perf, Whiteboard: [perf-wanted])

Attachments

(1 file, 11 obsolete files)

Raptor let us add performance markers in Gaia, but we also have the need to add performance markers in Gecko code.

The interface could be very similar to the Gaia counterpart, e.g. mark(), measure().

We should connect our interface to the implementation of Raptor, so that these entries can be retrieved together with the markers in Gaia.
Can we reuse PROFILER_LABEL or PROFILER_MARKER for the mark(), and add another macro for measure()?
Patrick said that there may be some cases where we want a marker, but do not want it to be exposed to profiler. Also, the marker could be denser than the profiler markers.

I'll talk to each functional team, asking their use case and what their desired API look like.

Chiajung said it's good to combine PROFILER_LABEL or PROFILER_MARKER and our marker, because it can save duplicated work. That's also exactly what Ting-Yu told me. There are already several of those in the critical parts of graphics code.

Also, the API should be able to work well in both the b2g process and content processes. The markers could also be placed and measured in different threads.
I've talked to John (media team) and Szu-Yu (RIL team) respectively. They were OK with combining performance markers with PROFILER_LABEL/MARKER.

As for Patrick's concern about exposing too much to the profiler, Chiajung and John both mentioned that we could use a special tag to avoid this.
I was planning about the performance mark interface in Gecko, and has collected some use cases and suggestions these days.

At first, I planned to use the corresponding C++ implementation of performance.mark[1]. It was not easy because in order to get a nsPerformance object, we need to get a window object.

Then, Thinker reminded me that the purpose of this interface is to generate logcat for Raptor to parse. I looked at where [1] generates logcat, it's

    PERFLOG("Performance Entry: %s|%s|%s|%f|%f|%" PRIu64 "\n",
            uri.get(),
            NS_ConvertUTF16toUTF8(aEntry->GetEntryType()).get(),
            NS_ConvertUTF16toUTF8(aEntry->GetName()).get(),
            aEntry->StartTime(),
            aEntry->Duration(),
            static_cast<uint64_t>(PR_Now() / PR_USEC_PER_MSEC));

which would generate something like:

    Performance Entry: email.gaiamobile.org|mark|navigationLoaded|1754.750363|0.000000|1430794848953
    Performance Entry: email.gaiamobile.org|mark|navigationInteractive|1754.869270|0.000000|1430794848953
    Performance Entry: email.gaiamobile.org|mark|visuallyLoaded|1755.240624|0.000000|1430794848953
    Performance Entry: email.gaiamobile.org|mark|contentInteractive|1755.449478|0.000000|1430794848953
    Performance Entry: email.gaiamobile.org|mark|fullyLoaded|1755.719374|0.000000|1430794848954

The first column is URI, I think we could come up with some convention such as "network.protocol.rtsp" for each component.
The second column is the entry type, which are all "mark" if it's generated by performance.mark().
For all marks, the duration is 0 by definition, which explains the fourth column. Non-zero duration occurs when entries are produced by performance.measure().
The fifth column, as you can see, is the absolute time of the moment.

The above could be done without using nsPerformance. As for the third column, it represents the elapsed time counting from the start of navigation.
In Gecko, we may have to set a reference point of time by ourselves, but I'm not sure when it should be.

Or perhaps we should focus on the difference between the marks? (which is what performance.measure() does)

[1] https://dxr.mozilla.org/mozilla-central/source/dom/base/nsPerformance.h#341
[2] https://dxr.mozilla.org/mozilla-central/source/dom/base/nsPerformance.cpp#678
Hey Jonathan,

You are correct about reference points with regards to measurements. In order for User Timing/performance marks to determine how long something takes, there is always a start mark and an end mark. In your standard web page, the start mark is implicit: performance.timing.navigationStart. Any mark created is essentially an implicit *performance.measure* between the time the mark was created and navigationStart. It's strange; all marks are technically just measures with a hard-coded starting point.

Raptor uses this same principle, but changes the start mark away from navigationStart to a different performance mark generated by the homescreen: appStart. Instead of calculating the delta between navigationStart and a marker to generated the time, we generate the delta between appStart and the marker. It's very similar, it just makes times a little longer than before to account for application startup time done by the System and homescreen.

In order for marks/measures to mean anything, they need a reference point to base their duration on. Without an application start or navigationStart (since these belong to the window or app), you need a different reference point. I think the usage of a performance interface for generating marks and measures is fine, but how you go about it depends on this reference point. Without a user-defined point, you couldn't rely on High-Resolution time, instead you would need to use absolute times, i.e. epochs, to generate marks, and have a measure method that would do the delta between those marks. I can't see any way of using a high-resolution timer without first having a defined reference point.

I'm sure you know all this, but I guess I just wanted to clear up what needs to happen here:

1. Generate an absolute timestamp for every "mark", and then "measure" the delta between the timestamp. Going this route would not require a reference marker.

Or:

2. Generate a high-resolution time for every "mark", again using "measure" to calculate the difference. This would require that every mark have some reference point for which to generate the initial delta.
Thanks Eli for the detailed explanation.

I think I'll use absolute timestamp for now, until we ever come up with a reasonable reference point.

I'm planning to create an XPCOM object for this API.
Attached patch WIP patch (obsolete) — Splinter Review
We can now measure the time difference between two markers.
Attachment #8604031 - Attachment is obsolete: true
Attached patch Performance Mark API - V1 (obsolete) — Splinter Review
I set the reference point as the time when the XPCOM object is created.
It now produces logs compatible with Raptor.
Attached patch Sample usage of this API (obsolete) — Splinter Review
To add a performance mark in Gecko:

nsCOMPtr mPerformance = do_GetService(NS_PERFORMANCE_CONTRACTID);
mPerformance->Mark("sample_mark");

For example, in this sample patch I add two marks in MediaDecoderStateMachine, when it starts to decode metadata and first frame.
These code will be run when the booting animation appears.

In gaia folder, run |make raptor| and
|RUNS=1 APP=settings node tests/raptor/restart_b2g_test|
we'll see the results

Metric                                        Mean       Median     Min        Max        StdDev  p95
--------------------------------------------  ---------  ---------  ---------  ---------  ------  ---
restartb2g.loadEnd                            10476.290  10476.290  10476.290  10476.290  0.000   n/a
restartb2g.DECODER_STATE_DECODING_METADATA    19077.290  19077.290  19077.290  19077.290  0.000   n/a
restartb2g.DECODER_STATE_DECODING_FIRSTFRAME  19079.290  19079.290  19079.290  19079.290  0.000   n/a
restartb2g.homescreenStart                    24578.290  24578.290  24578.290  24578.290  0.000   n/a
restartb2g.navigationLoaded                   24913.290  24913.290  24913.290  24913.290  0.000   n/a
restartb2g.navigationInteractive              24914.290  24914.290  24914.290  24914.290  0.000   n/a
restartb2g.osLogoEnd                          25083.290  25083.290  25083.290  25083.290  0.000   n/a
Attachment #8606155 - Attachment is obsolete: true
Attached patch Performance Mark API - V1 (obsolete) — Splinter Review
Hi Eli,

I wanted to reuse the code of the User Timing API, but I found it hard to do so.
Currently, I didn't reuse any of them. I just tried to make it produce similar logs so that Raptor can parse them.

Could you recommend someone that I could ask feedback or review from?
I've talked to Thinker's team, but they weren't sure who's in charge with this either.

I'm also not really sure which folder this piece of code should be in.

P.S. This patch is essentially the same as attachment 8607420 [details] [diff] [review]. Some redundant lines are removed.

Thanks!
Attachment #8607420 - Attachment is obsolete: true
Flags: needinfo?(eperelman)
Hey :baku, do you have any thoughts on this?
Flags: needinfo?(eperelman) → needinfo?(amarchesini)
:jhao, I can review your code, but, going quickly through your patch, I see that you have done sometime similar to what I did for bug 1155761. Basically I implemented a PerformanceBase class that can be used in workers and main-thread in order to share the basic methods for the Performance Timing API.

Hopefully my patch should land to m-i this week.
Wondering if my patch can be useful and we can merge your code with mine somehow.
Flags: needinfo?(amarchesini) → needinfo?(jhao)
Hi Andrea,

It looks very useful. I think it would be great to merge our code.

Thanks a lot!
Depends on: 1155761
Flags: needinfo?(jhao)
tracking-b2g: --- → +
Keywords: perf
Whiteboard: [perf-wanted]
Hi Andrea,

Now that bug 1155761 is resolved, I want to make my class inherit PerformanceBase to avoid duplicated code, but there's one problem.

This assertion occured in PerformanceEntry and PerformanceMeasure:
MOZ_ASSERT(mParent || !NS_IsMainThread());

It forbids us to use Performance.Mark() in main thread, since we don't have mParent.
We wish to be able to use Performance.Mark() in every thread.

How do you suggest we deal with this problem?
Thanks very much!
Flags: needinfo?(amarchesini)
> This assertion occured in PerformanceEntry and PerformanceMeasure:
> MOZ_ASSERT(mParent || !NS_IsMainThread());

mParent is just use for GetParentObject(). Change this assertion to something that covers your usecase.
Flags: needinfo?(amarchesini)
Attached patch Reuse PerformanceBase (obsolete) — Splinter Review
In this patch I tried to reuse PerformanceBase. It works fine most of the time.
But, when I try to test it by adding a mark in MediaDecoderStateMachine, it will crash and keeps reopening b2g process.
It crashes in NS_CycleCollectorSuspect3. The crash reason is that the decoder thread didn't start any cycle collector.
This patch is to test on MediaDecoderStateMachine.
fixed a problem, but the situation is still the same
Attachment #8617188 - Attachment is obsolete: true
I have confirmed with JW. When we called the function |Mark()|, it creates a PerformanceMark. JW said that anything with cycle collection must be created in the main thread (or perhaps worker thread).

So, if we still want to reuse code by inheriting PerformanceBase and be able to use it out of main thread, we'll need two kinds of PerformanceMarks, one with cycle collection and one without.

Hi :baku, do you think we should do that?
Flags: needinfo?(amarchesini)
Yes, you are right. But you don't care about PerformanceEntry/PerformanceMark. Probably you can change the code so that we create PerformanceEntry only when needed and we store some non CCed data in the arrays of entries.
Flags: needinfo?(amarchesini)
Attached patch Performance mark in Gecko (obsolete) — Splinter Review
Hi :baku,
Could you review my patch?
Attachment #8607429 - Attachment is obsolete: true
Attachment #8607445 - Attachment is obsolete: true
Attachment #8617189 - Attachment is obsolete: true
Attachment #8617199 - Attachment is obsolete: true
Attachment #8624153 - Flags: review?(amarchesini)
Comment on attachment 8624153 [details] [diff] [review]
Performance mark in Gecko

Review of attachment 8624153 [details] [diff] [review]:
-----------------------------------------------------------------

You cannot remove the CC from PerformanceEntry.
What I meant was something like:

1. instead having mUserEntries create something like:

struct UserEntry {
  nsRefPtr<PerformanceEntry> mEntry; // Initially null
  EntryType mType; // This can be just Mark and Meausre
  nsString mName;
  DOMHighResTimeStamp mStartTime;
  DOMHighResTimeStamp mDuration;
}

nsTArray<UserEntry> mUserEntries;

So, basically InsertUserEntry() adds a new data struct without creating a UserEntry.mEntry object.
Then in GetEntry* methods (not exposed in your nsIPerformance idl) you call:

PerformanceEntry* GetOrCreatePerformanceEntry(UserEntry& aEntry);

that creates mEntry in case it's null using mType, mName, mStartTime and mDuration.
How does it sound?
Attachment #8624153 - Flags: review?(amarchesini) → review-
Attached patch Performance Mark API (V3) (obsolete) — Splinter Review
This patch can be compiled and pass the test "test_performance_user_timing_html" in my computer.

I am writing a chrome test as :baku suggested in Whistler.
Attachment #8624153 - Attachment is obsolete: true
Attached patch Chrome test (obsolete) — Splinter Review
Hi :baku,

I wrote this test, but when I ran it, it produced this error:

7 INFO TEST-UNEXPECTED-FAIL | dom/base/test/chrome/test_bug1159639.xul | uncaught exception - NS_ERROR_XPC_CANT_CREATE_WN: Component returned failure code: 0x80570019 (NS_ERROR_XPC_CANT_CREATE_WN) [nsIJSCID.getService] at chrome://mochitests/content/chrome/dom/base/test/chrome/test_bug1159639.xul:30

NS_ERROR_XPC_CANT_CREATE_WN seems to come from https://dxr.mozilla.org/mozilla-central/source/js/xpconnect/src/XPCJSID.cpp?from=XPCJSID.cpp&case=true#696 but I don't really understand why. And I can getService without problems in C++ code.

Do you have any idea what the reason is?

Thanks!
Attachment #8628287 - Flags: feedback?(amarchesini)
Jonathan, sorry for the delay. I hope to give you an answer for this week!
The previous patch is out-of-date. This one is rebased.
I tried xpcshell-test, and it still crashed when I tried to getService.

I wonder if I exposed the XPCOM module wrong.

Since baku is away until 11/8, I'll have to find someone familiar with XPCOM to help.
Attachment #8628102 - Attachment is obsolete: true
Attachment #8628287 - Attachment is obsolete: true
Attachment #8628287 - Flags: feedback?(amarchesini)
Status: NEW → ASSIGNED
Priority: -- → P1
Unassign myself since I'm not actively working on Firefox OS anymore.
Assignee: jhao → nobody
Status: ASSIGNED → NEW
Firefox OS is not being worked on
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: