Implement API to add performance marker

RESOLVED WONTFIX

Status

Firefox OS
Performance
P1
normal
RESOLVED WONTFIX
3 years ago
4 months ago

People

(Reporter: jhao, Unassigned)

Tracking

({perf})

unspecified
ARM
Gonk (Firefox OS)
Dependency tree / graph

Firefox Tracking Flags

(tracking-b2g:+)

Details

(Whiteboard: [perf-wanted])

Attachments

(1 attachment, 11 obsolete attachments)

(Reporter)

Description

3 years ago
Raptor let us add performance markers in Gaia, but we also have the need to add performance markers in Gecko code.

The interface could be very similar to the Gaia counterpart, e.g. mark(), measure().

We should connect our interface to the implementation of Raptor, so that these entries can be retrieved together with the markers in Gaia.
Can we reuse PROFILER_LABEL or PROFILER_MARKER for the mark(), and add another macro for measure()?
(Reporter)

Comment 2

3 years ago
Patrick said that there may be some cases where we want a marker, but do not want it to be exposed to profiler. Also, the marker could be denser than the profiler markers.

I'll talk to each functional team, asking their use case and what their desired API look like.

Chiajung said it's good to combine PROFILER_LABEL or PROFILER_MARKER and our marker, because it can save duplicated work. That's also exactly what Ting-Yu told me. There are already several of those in the critical parts of graphics code.

Also, the API should be able to work well in both the b2g process and content processes. The markers could also be placed and measured in different threads.
(Reporter)

Comment 3

3 years ago
I've talked to John (media team) and Szu-Yu (RIL team) respectively. They were OK with combining performance markers with PROFILER_LABEL/MARKER.

As for Patrick's concern about exposing too much to the profiler, Chiajung and John both mentioned that we could use a special tag to avoid this.
(Reporter)

Comment 4

3 years ago
I was planning about the performance mark interface in Gecko, and has collected some use cases and suggestions these days.

At first, I planned to use the corresponding C++ implementation of performance.mark[1]. It was not easy because in order to get a nsPerformance object, we need to get a window object.

Then, Thinker reminded me that the purpose of this interface is to generate logcat for Raptor to parse. I looked at where [1] generates logcat, it's

    PERFLOG("Performance Entry: %s|%s|%s|%f|%f|%" PRIu64 "\n",
            uri.get(),
            NS_ConvertUTF16toUTF8(aEntry->GetEntryType()).get(),
            NS_ConvertUTF16toUTF8(aEntry->GetName()).get(),
            aEntry->StartTime(),
            aEntry->Duration(),
            static_cast<uint64_t>(PR_Now() / PR_USEC_PER_MSEC));

which would generate something like:

    Performance Entry: email.gaiamobile.org|mark|navigationLoaded|1754.750363|0.000000|1430794848953
    Performance Entry: email.gaiamobile.org|mark|navigationInteractive|1754.869270|0.000000|1430794848953
    Performance Entry: email.gaiamobile.org|mark|visuallyLoaded|1755.240624|0.000000|1430794848953
    Performance Entry: email.gaiamobile.org|mark|contentInteractive|1755.449478|0.000000|1430794848953
    Performance Entry: email.gaiamobile.org|mark|fullyLoaded|1755.719374|0.000000|1430794848954

The first column is URI, I think we could come up with some convention such as "network.protocol.rtsp" for each component.
The second column is the entry type, which are all "mark" if it's generated by performance.mark().
For all marks, the duration is 0 by definition, which explains the fourth column. Non-zero duration occurs when entries are produced by performance.measure().
The fifth column, as you can see, is the absolute time of the moment.

The above could be done without using nsPerformance. As for the third column, it represents the elapsed time counting from the start of navigation.
In Gecko, we may have to set a reference point of time by ourselves, but I'm not sure when it should be.

Or perhaps we should focus on the difference between the marks? (which is what performance.measure() does)

[1] https://dxr.mozilla.org/mozilla-central/source/dom/base/nsPerformance.h#341
[2] https://dxr.mozilla.org/mozilla-central/source/dom/base/nsPerformance.cpp#678

Comment 5

3 years ago
Hey Jonathan,

You are correct about reference points with regards to measurements. In order for User Timing/performance marks to determine how long something takes, there is always a start mark and an end mark. In your standard web page, the start mark is implicit: performance.timing.navigationStart. Any mark created is essentially an implicit *performance.measure* between the time the mark was created and navigationStart. It's strange; all marks are technically just measures with a hard-coded starting point.

Raptor uses this same principle, but changes the start mark away from navigationStart to a different performance mark generated by the homescreen: appStart. Instead of calculating the delta between navigationStart and a marker to generated the time, we generate the delta between appStart and the marker. It's very similar, it just makes times a little longer than before to account for application startup time done by the System and homescreen.

In order for marks/measures to mean anything, they need a reference point to base their duration on. Without an application start or navigationStart (since these belong to the window or app), you need a different reference point. I think the usage of a performance interface for generating marks and measures is fine, but how you go about it depends on this reference point. Without a user-defined point, you couldn't rely on High-Resolution time, instead you would need to use absolute times, i.e. epochs, to generate marks, and have a measure method that would do the delta between those marks. I can't see any way of using a high-resolution timer without first having a defined reference point.

I'm sure you know all this, but I guess I just wanted to clear up what needs to happen here:

1. Generate an absolute timestamp for every "mark", and then "measure" the delta between the timestamp. Going this route would not require a reference marker.

Or:

2. Generate a high-resolution time for every "mark", again using "measure" to calculate the difference. This would require that every mark have some reference point for which to generate the initial delta.
(Reporter)

Comment 6

3 years ago
Thanks Eli for the detailed explanation.

I think I'll use absolute timestamp for now, until we ever come up with a reasonable reference point.

I'm planning to create an XPCOM object for this API.
(Reporter)

Comment 7

3 years ago
Created attachment 8604031 [details] [diff] [review]
(WIP) Designed as an XPCOM service. Not implemented yet.
(Reporter)

Comment 8

3 years ago
Created attachment 8606155 [details] [diff] [review]
WIP patch

We can now measure the time difference between two markers.
Attachment #8604031 - Attachment is obsolete: true
(Reporter)

Comment 9

3 years ago
Created attachment 8607420 [details] [diff] [review]
Performance Mark API - V1

I set the reference point as the time when the XPCOM object is created.
It now produces logs compatible with Raptor.
(Reporter)

Comment 10

3 years ago
Created attachment 8607429 [details] [diff] [review]
Sample usage of this API

To add a performance mark in Gecko:

nsCOMPtr mPerformance = do_GetService(NS_PERFORMANCE_CONTRACTID);
mPerformance->Mark("sample_mark");

For example, in this sample patch I add two marks in MediaDecoderStateMachine, when it starts to decode metadata and first frame.
These code will be run when the booting animation appears.

In gaia folder, run |make raptor| and
|RUNS=1 APP=settings node tests/raptor/restart_b2g_test|
we'll see the results

Metric                                        Mean       Median     Min        Max        StdDev  p95
--------------------------------------------  ---------  ---------  ---------  ---------  ------  ---
restartb2g.loadEnd                            10476.290  10476.290  10476.290  10476.290  0.000   n/a
restartb2g.DECODER_STATE_DECODING_METADATA    19077.290  19077.290  19077.290  19077.290  0.000   n/a
restartb2g.DECODER_STATE_DECODING_FIRSTFRAME  19079.290  19079.290  19079.290  19079.290  0.000   n/a
restartb2g.homescreenStart                    24578.290  24578.290  24578.290  24578.290  0.000   n/a
restartb2g.navigationLoaded                   24913.290  24913.290  24913.290  24913.290  0.000   n/a
restartb2g.navigationInteractive              24914.290  24914.290  24914.290  24914.290  0.000   n/a
restartb2g.osLogoEnd                          25083.290  25083.290  25083.290  25083.290  0.000   n/a
(Reporter)

Updated

3 years ago
Attachment #8606155 - Attachment is obsolete: true
(Reporter)

Comment 11

3 years ago
Created attachment 8607445 [details] [diff] [review]
Performance Mark API - V1

Hi Eli,

I wanted to reuse the code of the User Timing API, but I found it hard to do so.
Currently, I didn't reuse any of them. I just tried to make it produce similar logs so that Raptor can parse them.

Could you recommend someone that I could ask feedback or review from?
I've talked to Thinker's team, but they weren't sure who's in charge with this either.

I'm also not really sure which folder this piece of code should be in.

P.S. This patch is essentially the same as attachment 8607420 [details] [diff] [review]. Some redundant lines are removed.

Thanks!
Attachment #8607420 - Attachment is obsolete: true
Flags: needinfo?(eperelman)

Comment 12

3 years ago
Hey :baku, do you have any thoughts on this?
Flags: needinfo?(eperelman) → needinfo?(amarchesini)
:jhao, I can review your code, but, going quickly through your patch, I see that you have done sometime similar to what I did for bug 1155761. Basically I implemented a PerformanceBase class that can be used in workers and main-thread in order to share the basic methods for the Performance Timing API.

Hopefully my patch should land to m-i this week.
Wondering if my patch can be useful and we can merge your code with mine somehow.
Flags: needinfo?(amarchesini) → needinfo?(jhao)
(Reporter)

Comment 14

3 years ago
Hi Andrea,

It looks very useful. I think it would be great to merge our code.

Thanks a lot!
Depends on: 1155761
Flags: needinfo?(jhao)

Updated

3 years ago
tracking-b2g: --- → +
Keywords: perf
Whiteboard: [perf-wanted]
(Reporter)

Comment 15

3 years ago
Hi Andrea,

Now that bug 1155761 is resolved, I want to make my class inherit PerformanceBase to avoid duplicated code, but there's one problem.

This assertion occured in PerformanceEntry and PerformanceMeasure:
MOZ_ASSERT(mParent || !NS_IsMainThread());

It forbids us to use Performance.Mark() in main thread, since we don't have mParent.
We wish to be able to use Performance.Mark() in every thread.

How do you suggest we deal with this problem?
Thanks very much!
Flags: needinfo?(amarchesini)
> This assertion occured in PerformanceEntry and PerformanceMeasure:
> MOZ_ASSERT(mParent || !NS_IsMainThread());

mParent is just use for GetParentObject(). Change this assertion to something that covers your usecase.
Flags: needinfo?(amarchesini)
(Reporter)

Comment 17

3 years ago
Created attachment 8617188 [details] [diff] [review]
Reuse PerformanceBase

In this patch I tried to reuse PerformanceBase. It works fine most of the time.
But, when I try to test it by adding a mark in MediaDecoderStateMachine, it will crash and keeps reopening b2g process.
It crashes in NS_CycleCollectorSuspect3. The crash reason is that the decoder thread didn't start any cycle collector.
(Reporter)

Comment 18

3 years ago
Created attachment 8617189 [details] [diff] [review]
Testing on MediaDecoderStateMachine

This patch is to test on MediaDecoderStateMachine.
(Reporter)

Comment 19

3 years ago
Created attachment 8617199 [details] [diff] [review]
Presentation Mark API V2 (reuse PerformanceBase)

fixed a problem, but the situation is still the same
Attachment #8617188 - Attachment is obsolete: true
(Reporter)

Comment 20

3 years ago
I have confirmed with JW. When we called the function |Mark()|, it creates a PerformanceMark. JW said that anything with cycle collection must be created in the main thread (or perhaps worker thread).

So, if we still want to reuse code by inheriting PerformanceBase and be able to use it out of main thread, we'll need two kinds of PerformanceMarks, one with cycle collection and one without.

Hi :baku, do you think we should do that?
Flags: needinfo?(amarchesini)
Yes, you are right. But you don't care about PerformanceEntry/PerformanceMark. Probably you can change the code so that we create PerformanceEntry only when needed and we store some non CCed data in the arrays of entries.
Flags: needinfo?(amarchesini)
(Reporter)

Comment 22

3 years ago
Created attachment 8624153 [details] [diff] [review]
Performance mark in Gecko

Hi :baku,
Could you review my patch?
Attachment #8607429 - Attachment is obsolete: true
Attachment #8607445 - Attachment is obsolete: true
Attachment #8617189 - Attachment is obsolete: true
Attachment #8617199 - Attachment is obsolete: true
Attachment #8624153 - Flags: review?(amarchesini)
Comment on attachment 8624153 [details] [diff] [review]
Performance mark in Gecko

Review of attachment 8624153 [details] [diff] [review]:
-----------------------------------------------------------------

You cannot remove the CC from PerformanceEntry.
What I meant was something like:

1. instead having mUserEntries create something like:

struct UserEntry {
  nsRefPtr<PerformanceEntry> mEntry; // Initially null
  EntryType mType; // This can be just Mark and Meausre
  nsString mName;
  DOMHighResTimeStamp mStartTime;
  DOMHighResTimeStamp mDuration;
}

nsTArray<UserEntry> mUserEntries;

So, basically InsertUserEntry() adds a new data struct without creating a UserEntry.mEntry object.
Then in GetEntry* methods (not exposed in your nsIPerformance idl) you call:

PerformanceEntry* GetOrCreatePerformanceEntry(UserEntry& aEntry);

that creates mEntry in case it's null using mType, mName, mStartTime and mDuration.
How does it sound?
Attachment #8624153 - Flags: review?(amarchesini) → review-
(Reporter)

Comment 24

3 years ago
Created attachment 8628102 [details] [diff] [review]
Performance Mark API (V3)

This patch can be compiled and pass the test "test_performance_user_timing_html" in my computer.

I am writing a chrome test as :baku suggested in Whistler.
Attachment #8624153 - Attachment is obsolete: true
(Reporter)

Comment 25

3 years ago
Created attachment 8628287 [details] [diff] [review]
Chrome test

Hi :baku,

I wrote this test, but when I ran it, it produced this error:

7 INFO TEST-UNEXPECTED-FAIL | dom/base/test/chrome/test_bug1159639.xul | uncaught exception - NS_ERROR_XPC_CANT_CREATE_WN: Component returned failure code: 0x80570019 (NS_ERROR_XPC_CANT_CREATE_WN) [nsIJSCID.getService] at chrome://mochitests/content/chrome/dom/base/test/chrome/test_bug1159639.xul:30

NS_ERROR_XPC_CANT_CREATE_WN seems to come from https://dxr.mozilla.org/mozilla-central/source/js/xpconnect/src/XPCJSID.cpp?from=XPCJSID.cpp&case=true#696 but I don't really understand why. And I can getService without problems in C++ code.

Do you have any idea what the reason is?

Thanks!
Attachment #8628287 - Flags: feedback?(amarchesini)
Jonathan, sorry for the delay. I hope to give you an answer for this week!
(Reporter)

Comment 27

3 years ago
Created attachment 8643621 [details] [diff] [review]
Performance Mark API (V4)

The previous patch is out-of-date. This one is rebased.
I tried xpcshell-test, and it still crashed when I tried to getService.

I wonder if I exposed the XPCOM module wrong.

Since baku is away until 11/8, I'll have to find someone familiar with XPCOM to help.
Attachment #8628102 - Attachment is obsolete: true
Attachment #8628287 - Attachment is obsolete: true
Attachment #8628287 - Flags: feedback?(amarchesini)

Updated

3 years ago
Status: NEW → ASSIGNED

Updated

3 years ago
Priority: -- → P1
(Reporter)

Comment 28

2 years ago
Unassign myself since I'm not actively working on Firefox OS anymore.
Assignee: jhao → nobody
Status: ASSIGNED → NEW

Comment 29

4 months ago
Firefox OS is not being worked on
Status: NEW → RESOLVED
Last Resolved: 4 months ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.