[Metrics] Timing issue with performaneentry-based metrics

RESOLVED WORKSFORME

Status

RESOLVED WORKSFORME
3 years ago
3 years ago

People

(Reporter: rnicoletti, Assigned: rnicoletti)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Assignee)

Description

3 years ago
My recent testing of the bug 1204938 patch uncovered a timing issue with telemetry metrics that are based on performance marks. Specifically, some performanceentry events are dispatched before the application's performance entry front is started. 

It seems there has been some gecko change that has highlighted this potential timing issue (I had not seen this issue until I updated my m-c repo yestderay -- 9/16). In any event, during my testing of yesterday and today, the 'appLaunch' performance entry event is consistently lost due to the application's performance front starting after the event is dispatched. Without this event, the "app startup time" and "app memory" telemetry metrics cannot be recorded.
(Assignee)

Comment 1

3 years ago
Jan, now that I've encountered this issue, I'm surprised I didn't see it before. I don't know what triggers the actor to be "installed" into the application's process, but I'm not at all surprised since it seems reasonable the 'appLaunch' event would be emitted before the actor is started as "app launch" happens as soon as the app is launched.

I'd appreciate your feedback here. Thanks.
Flags: needinfo?(janx)
(Assignee)

Comment 2

3 years ago
Hi jryans, any input you have is also appreciated.
Flags: needinfo?(jryans)
This sounds very similar to the discussion we had in bug 1177226 comment 61.

I thought the Homescreen app is emitting the "appLaunch" mark, right?  So, you should receive it from Homescreen app, but not the app that is being launched.

Like you say above, the front does not exist when the app to be launched is just starting, so you depend on some pre-existing thing (which I thought was Homescreen) to send this mark.
Flags: needinfo?(jryans)
(Assignee)

Comment 4

3 years ago
Yes, thanks for reminding me of that. 

After more investigation, I've found that the homescreen performance-entry actor is not receiving *any* homescreen `performanceentry` events. I've added the following logging: when an actor is started, the performanceentry events as they get dispatched, and the events as they are received by the performance-entries listener.

For example, see [1] for the logging when the system app actor is started and when system app performanceentry events are dispatched and received. [2] is logging of when the Homescreen actor is started. I'm confident the logging shown at [2] corresponds to the Homescreen actor because even though the actor url and contentDocument are empty, the Gecko process does indicate `Homescreen` and the Homescreen performanceentry events appear immediately after the unnamed actor is started. However, it does seem odd to me the url and contentDocument attributes are empty. Is that indicative of an issue with the actor? Or perhaps simply a red herring.

[1]
I/GeckoConsole(16246): Content JS LOG: actor starting: app://system.gaiamobile.org/index.html 
I/GeckoConsole(16246): Content JS LOG: performanceentry listener set for actor.url: app://system.gaiamobile.org/index.html 1442773445884 
I/GeckoConsole(16246): Content JS LOG: performanceentry listener set for contentDocument: [object HTMLDocument] 1442773445887 
I/Gecko   (16246): console.log: performanceentry listener set for contentDocument: HTMLDocument {"emergency-callback-icon":{},"toaster-icon":{},"location":{"href":"app://system.gaiamobile.org/index.html","origin":"app://system.gaiamobile.org","protocol":"app:","username":"","password":"","host":"system.gaiamobile.org","hostname":"system.gaiamobile.org","port":"","pathname":"/index.html","search":"","hash":""}} 1442773445887 
I/PerformanceTiming(16246): dispatching event visuallyLoaded, now: 1442773450777
I/GeckoConsole(16246): Content JS LOG: onPerformanceEntry, name: visuallyLoaded , app: system.gaiamobile.org , now: 1442773450778 

[2]
I/Default Home Screen( 1054): Content JS LOG: actor starting: about:blank 
I/Default Home Screen( 1054): Content JS LOG: performanceentry listener set for actor.url: about:blank 1442820213971 
I/Default Home Screen( 1054): Content JS LOG: performanceentry listener set for contentDocument: [object HTMLDocument] 1442820213974 
I/Gecko   ( 1054): console.log: performanceentry listener set for contentDocument: HTMLDocument {"location":{"href":"about:blank","origin":"null","protocol":"about:","username":"","password":"","host":"","hostname":"","port":"","pathname":"","search":"","hash":""}} 1442820213974 
I/PerformanceTiming( 1054): TimingNotification, after building PerformanceEntryEventInit: homescreenStart@system.gaiamobile.org|mark|verticalhome.gaiamobile.org|1505.235000|0|1442820214397
I/PerformanceTiming( 1054): dispatching event homescreenStart@system.gaiamobile.org, now: 1442820214398
I/PerformanceTiming( 1054): TimingNotification, after building PerformanceEntryEventInit: navigationLoaded|mark|verticalhome.gaiamobile.org|1896.810000|0|1442820214788
I/PerformanceTiming( 1054): dispatching event navigationLoaded, now: 1442820214788
I/PerformanceTiming( 1054): TimingNotification, after building PerformanceEntryEventInit: navigationInteractive|mark|verticalhome.gaiamobile.org|1898.555000|0|1442820214790
I/PerformanceTiming( 1054): dispatching event navigationInteractive, now: 1442820214790
I/PerformanceTiming( 1054): TimingNotification, after building PerformanceEntryEventInit: visuallyLoaded|mark|verticalhome.gaiamobile.org|3670.930000|0|1442820216562
I/PerformanceTiming( 1054): dispatching event visuallyLoaded, now: 1442820216563
I/PerformanceTiming( 1054): TimingNotification, after building PerformanceEntryEventInit: contentInteractive|mark|verticalhome.gaiamobile.org|3671.985000|0|1442820216564
I/PerformanceTiming( 1054): dispatching event contentInteractive, now: 1442820216564
I/PerformanceTiming( 1054): TimingNotification, after building PerformanceEntryEventInit: fullyLoaded|mark|verticalhome.gaiamobile.org|4641.890000|0|1442820217533
I/PerformanceTiming( 1054): dispatching event fullyLoaded, now: 1442820217533
Flags: needinfo?(jryans)

Comment 5

3 years ago
I see two potential problems here:

1) It looks like the performance entry actors don't work properly in apps that are not the System App (this reminds me of the `if(SystemApp)` specific code in hud.js, but that's probably unrelated). Do you always see "actor starting: about:blank" when you start any other app than the System App? And then do you never receive any performance entries from that actor? This might also be an out-of-process issue (most apps are out-of-process, except System and Marketplace if I'm not mistaken).

2) Are you saying that no "appLaunch" marks are ever being dispatched (even if not received) from the Home Screen, even when you launch an app from it? That sounds very suspicious, because it's what https://raptor.mozilla.org/ is using to measure "Cold Launch" timings. I believe you should see something similar to the example below when you tap the Marketplace app icon:

    I/PerformanceTiming( 1054): TimingNotification, after building PerformanceEntryEventInit:
    appLaunch@marketplace.firefox.com|mark|verticalhome.gaiamobile.org|3670.930000|0|1442820216562
Flags: needinfo?(janx) → needinfo?(rnicoletti)
(Assignee)

Comment 6

3 years ago
(In reply to Jan Keromnes [:janx] from comment #5)
> I see two potential problems here:
> 
> 1) It looks like the performance entry actors don't work properly in apps
> that are not the System App (this reminds me of the `if(SystemApp)` specific
> code in hud.js, but that's probably unrelated). Do you always see "actor
> starting: about:blank" when you start any other app than the System App? And
> then do you never receive any performance entries from that actor? This
> might also be an out-of-process issue (most apps are out-of-process, except
> System and Marketplace if I'm not mistaken).

Normally, the app actor does not appear as "about:blank". I say "normally" because just now when I was testing, the sms actor *did* appear as "about:blank" while the other app actors (contacts, dialer, gallery, camera) appeared with the proper actor url. When I killed all the open apps and then re-launched the sms app, its actor appeared with the proper url.

Summary:
* The Homescreen actor always appears as "about:blank"
* The actors for the user-launchable apps normally appear with the proper actor url
* When an actor *does not* have the proper url, it *does not* receive performanceentry events (perhaps it doesn't receive any events but I haven't validated that).
* When an actor *does have* the proper url, it *does receive* performanceentry events

> 
> 2) Are you saying that no "appLaunch" marks are ever being dispatched (even
> if not received) from the Home Screen, even when you launch an app from it?
> That sounds very suspicious, because it's what https://raptor.mozilla.org/
> is using to measure "Cold Launch" timings. I believe you should see
> something similar to the example below when you tap the Marketplace app icon:
> 
>     I/PerformanceTiming( 1054): TimingNotification, after building
> PerformanceEntryEventInit:
>    
> appLaunch@marketplace.firefox.com|mark|verticalhome.gaiamobile.org|3670.
> 930000|0|1442820216562

"appLaunch" events are always being dispatched.
Flags: needinfo?(rnicoletti)

Comment 7

3 years ago
(In reply to Russ Nicoletti [:russn] from comment #6)
> Normally, the app actor does not appear as "about:blank". I say "normally"
> because just now when I was testing, the sms actor *did* appear as
> "about:blank" while the other app actors (contacts, dialer, gallery, camera)
> appeared with the proper actor url. When I killed all the open apps and then
> re-launched the sms app, its actor appeared with the proper url.
> 
> Summary:
> * The Homescreen actor always appears as "about:blank"
> * The actors for the user-launchable apps normally appear with the proper
> actor url
> * When an actor *does not* have the proper url, it *does not* receive
> performanceentry events (perhaps it doesn't receive any events but I haven't
> validated that).
> * When an actor *does have* the proper url, it *does receive*
> performanceentry events

This sounds a lot like a race condition. I'm under the impression that when you start an app process, it briefly points to "about:blank" before navigating to the right location. Sometimes, a performanceEntryActor probably starts too early, and maybe it attaches itself on the wrong window?

Alex, is it possible that we load and start the performanceEntryActor too soon for some apps? It looks like the actor sometimes attaches a listener to an "about:blank" window, and then it never receives any performanceEntry events:

https://dxr.mozilla.org/mozilla-central/source/toolkit/devtools/server/actors/performance-entries.js#42

> "appLaunch" events are always being dispatched.

So we're only fighting with an actor problem on the reception side. Good news!
Flags: needinfo?(jryans) → needinfo?(poirot.alex)
This actor should listen for window-ready/window-destroyed in order to track documents.
Like webaudio editor:
  https://dxr.mozilla.org/mozilla-central/source/toolkit/devtools/server/actors/webaudio.js#565-570
Flags: needinfo?(poirot.alex)
The about:blank is a typical issue on the web, but here it shouldn't hurt.
the window.addEventListener("performanceentry", ... should work, even if we do it while loading the about:blank document before the homescreen.
Using window-ready/window-destroyed is going to help support iframe switching. Or handling iframe within your targeted document.

Here is a serie of dumb questions:
- Isn't `start` called too late? After the event have been fired?
- You may add a log there, with console.log(this.window.document.readyState) to see where you are in the document loading.
- if you add a log in onPerformanceEntry, do you see something in adb logcat for the failing apps?
(Assignee)

Comment 10

3 years ago
I was ready to dismiss this issue after flashing the latest from m-c and not being able to reproduce but then I noticed the first time I launched the sms app its actor's url was "about:blank" and it didn't receive performanceentry events.

Regarding the "dumb" questions:
* I did add logging to onPerformanceEntry and I never see those log entries for apps whose actor url is "about:blank".
* I'm now logging 'this.window.document.readyState' when a performance-entry actor is starting. When the state is 'loading', 'interactive', or 'complete', the actor url is proper. When the readyState is 'uninitialized', the actor url is blank.

New information: When disabling/enabling Developer Tools in the Developer HUD settings, the actor urls for all the performance-entry actors have proper urls and they are getting all the performanceentry events. It's only when the system is starting that some of the actors, almost always homescreen, have a blank url.

So the "about:blank" issue with some actors is certainly intermittent and given it happens at system startup and not when the device is already up and running seems to lend credibility to the timing issue theory.
Flags: needinfo?(poirot.alex)
Flags: needinfo?(janx)
Could you check if listening to window-ready helps?

  start: function () {
   ...
   on(this.tabActor, "window-ready", this._onGlobalCreated);
   ...
  },
  _onGlobalCreated: function ({window}) {
    console.log("New global!");
    window.addEventListener("performanceentry", this.onPerformanceEntry, true);
  }

But I'm not sure you will get a window-ready event...

Otherwise, a workaround could be to first wait for a DOMContentLoaded or a load event on the this.window from start method (only if readyState is uninitialized), before registering the performanceentry listener... But I imagine you want to listen for these events ASAP?
Flags: needinfo?(poirot.alex)
(Assignee)

Comment 12

3 years ago
I didn't have any luck with the 'window-ready' event, I guess the actor is not getting it.

About the DOMContentLoaded workaround, that would be fine. It seems to come before any performanceentry events. But I didn't have much luck with that either. At first I added the DOMContentLoaded listener only when readyState is uninitialized, but the event didn't come in that case. Then, as a sanity check, I added the listener regardless of readyState. Well, the sanity check came up "insane" because the actors whose readyState != 'uninitialized' (and != 'complete') get the DOMContentLoaded event but the actors whose readyState is 'uninitialized' don't get the event. Weird because when I toggle the developer tools and the actors get re-loaded, the ready state for all of the actors is 'complete'.

All of this seems consistent with Jan's statement in comment 7: "Sometimes, a performanceEntryActor probably starts too early, and maybe it attaches itself on the wrong window"

So if it attaches to the wrong window then that "wrong" window won't get the DOMContentLoaded event (nor will it get performanceentry events).
Flags: needinfo?(poirot.alex)
(Assignee)

Comment 13

3 years ago
Alex, Jan, what is the next course of action that should be taken for this issue? It's blocking a very important 2.5 telemetry feature. My sense is Jan's statement in comment 7: "Sometimes, a performanceEntryActor probably starts too early, and maybe it attaches itself on the wrong window" would be worth exploring. What do you think?
Hi Alex/Jan,

Do you have any advice on next steps for Russ?  We need to get this fixed for 2.5.

Thanks,

-tamara

Updated

3 years ago
Blocks: 1152000
[Blocking Requested - why for this release]:
Rationale for making this a blocker is that it is blocking the feature of collecting app startup time as part of the telemetry feature.
blocking-b2g: --- → 2.5?

Updated

3 years ago
QA Whiteboard: [COM=Telemetry]
(Assignee)

Comment 16

3 years ago
With my testing with m-c pulled today ([1]), the problem is not reproducible. The performance entry actors are consistently having the appropriate urls and the 'appLaunch' performance entry events are being received by the homescreen performance entry actor which results in the app-startup metrics being recorded properly. 

As encouraged as I am about this, it would be nice to know if there was a gecko change that brought about the change in behavior. I think I speak for everyone regarding being slightly uncomfortable not know why the behavior changed and if it will remain consistent.

Jan/Alex, if you point me in the right direction, I'll look for recent code changes that may have affected this.

[1] http://hg.mozilla.org/mozilla-central/rev/645b0892ebfc
Status: NEW → ASSIGNED
I can't think of anything landed recently in devtools land to fix such issue.
But may be something in platform changes, may be something related to frame scripts.

Otherwise here is the overview of what is happening with actors.
In the child process for the given app, the parent process send a message, via the message manager to instanciate a ContentActor:
  http://mxr.mozilla.org/mozilla-central/source/devtools/server/child.js#45
We pass it the global of the frame script, on which we retrieve the docshell over here:
  http://mxr.mozilla.org/mozilla-central/source/devtools/server/actors/childtab.js#38
Which is used to compute the window, received by en performance entry actor:
  http://mxr.mozilla.org/mozilla-central/source/devtools/server/actors/webbrowser.js#801

The thing is that, to me, it should be fine if you instanciate the performance entry "early".
The window object should be ok, even if its location is about:blank.
You should be able to listen for load or similar events on it.
And I can't think of anything that would mix the window object so that it's not the one you expect.
Flags: needinfo?(poirot.alex)
Removing the nomination for 2.5 as its not reproducible based on Comment 16. Please re nominate if the issue resurfaces
blocking-b2g: 2.5? → ---
(Assignee)

Updated

3 years ago
Status: ASSIGNED → RESOLVED
Last Resolved: 3 years ago
Flags: needinfo?(janx)
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.