Closed Bug 841323 Opened 11 years ago Closed 11 years ago

Perf review for FHR datareporting/

Categories

(Firefox Health Report Graveyard :: Client: Desktop, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: Yoric, Unassigned)

References

Details

(Keywords: perf, Whiteboard: [Snappy:P1])

Attachments

(1 file)

Just as bug 836719, but different directory.
Attachment #713860 - Flags: feedback?(gps)
Comment on attachment 713860 [details] [diff] [review]
First look at datareporting/

Review of attachment 713860 [details] [diff] [review]:
-----------------------------------------------------------------

::: services/datareporting/policy.jsm
@@ +321,5 @@
>     */
>    POLL_INTERVAL_MSEC: (60 * 1000) + Math.floor(2.5 * 1000 * Math.random()),
> +// Do I understand correctly that this poll is an implementation of cron?
> +// If so, this might deserve its own component, possibly in toolkit/.
> +// Also, this could certainly be implemented without polling.

It is essentially a cron, yes. Does it deserve to live in Toolkit, not IMO.

We could certainly implement things not using "cron." The initial plan was to drive hourly and possibly even minutely collections off this timer. We figured 1 timer firing somewhat rarely was a cost complexity win against 3 or 4 separate timers. 1 master driving function to rule all activity.

Is a timer firing every minute that big of a deal? What performance impact does it have? How many such recurring timers exist in the tree? If FHR is one of several, I don't see a big deal. i.e. please put this in perspective for me.

@@ +696,5 @@
>     *
>     * You typically call this function for each constructed instance.
>     */
> +// Something is not clear to me: how many instances are there?
> +// Not clear to me either: why do we need to poll at all?

There is one instance of this policy object in the app. We made it a constructable type to foster testing. If you look at the unit tests for all of FHR, you'll notice we instantiate new instances of everything pointing against their own pristine pref branches and databases. This makes it much, much easier to test since we don't have to worry about tests trampling on others or the system service.

@@ +822,5 @@
>     * @return bool Whether user has responded to data policy.
>     */
>    ensureNotifyResponse: function ensureNotifyResponse(now) {
> +// Wait, that's triggered by polling? Why aren't we rather waiting for an event
> +// + a set of fulfilled promises?

We could use separate timers or wait on events/promises for notification response if we wanted to. We do need a timer somewhere as the data upload policy is implicitly accepted after 8 hours of no user input on the notification bar.

@@ +932,5 @@
>        this._inProgressSubmissionRequest = null;
>        this._handleSubmissionFailure();
>      }.bind(this);
>  
> +// Nit: As usual, this could be made clearer and less spaghetti-prone with Task.jsm

This file was implemented before I knew about Task.jsm. If I were to write it today, it would likely use Task.jsm.

@@ +973,5 @@
>        // the server. However, the frequency of delete requests across all
>        // clients should be low, so this shouldn't pose a problem.
> +// I'm trying to understand: if I understand this comment correctly, we are
> +// trusting the clients on not DDoS-ing the server. Is that correct? If so,
> +// there is something rotten here.

The comment errors on the side of paranoia. We go through great lengths to honor backoffs, etc everywhere in the code except for here, which ignores those things. In our opinion, the importance of deleting data outweighs a very minor increased risk of DDoS. The comment could be improved.

Also, since FHR is installed on every Firefox client and Firefox has a few hundred million instances, every time it talks with a remote server there is a potential for DDoS. Welcome to services at scale.

::: services/datareporting/sessions.jsm
@@ +233,5 @@
>      this._os.addObserver(this, "profile-before-change", false);
>      this._os.addObserver(this, "user-interaction-active", false);
>      this._os.addObserver(this, "user-interaction-inactive", false);
> +// That looks elegant but quite expensive. Couldn't we just patch
> +// nsEventStateManager to provide statistics?

Sure. If someone wants to code C++ to do this, we can switch FHR to use that. This was implemented in JS so we could meet deadlines.

@@ +319,5 @@
>    _deserialize: function (s) {
>      let o;
>      try {
>        o = JSON.parse(s);
> +// That object should pretty small, am I right?

About 80 bytes per entry.
Attachment #713860 - Flags: feedback?(gps) → feedback+
(In reply to Gregory Szorc [:gps] from comment #1)
> It is essentially a cron, yes. Does it deserve to live in Toolkit, not IMO.

As it turns out, we already have a cron in Toolkit: nsIUpdateTimerManager (thanks Mossop).

> We could certainly implement things not using "cron." The initial plan was
> to drive hourly and possibly even minutely collections off this timer. We
> figured 1 timer firing somewhat rarely was a cost complexity win against 3
> or 4 separate timers. 1 master driving function to rule all activity.
> 
> Is a timer firing every minute that big of a deal? What performance impact
> does it have? How many such recurring timers exist in the tree? If FHR is
> one of several, I don't see a big deal. i.e. please put this in perspective
> for me.

Actually, the deal is that we are attempting to get rid of all such polling timers in the codebase. So when I see an opportunity to prevent one from being added, I take it :)

> 
> @@ +696,5 @@
> >     *
> >     * You typically call this function for each constructed instance.
> >     */
> > +// Something is not clear to me: how many instances are there?
> > +// Not clear to me either: why do we need to poll at all?
> 
> There is one instance of this policy object in the app. We made it a
> constructable type to foster testing. If you look at the unit tests for all
> of FHR, you'll notice we instantiate new instances of everything pointing
> against their own pristine pref branches and databases. This makes it much,
> much easier to test since we don't have to worry about tests trampling on
> others or the system service.

Ok. Whenever you find time to add documentation, one line on the subject might be useful.

> 
> @@ +822,5 @@
> >     * @return bool Whether user has responded to data policy.
> >     */
> >    ensureNotifyResponse: function ensureNotifyResponse(now) {
> > +// Wait, that's triggered by polling? Why aren't we rather waiting for an event
> > +// + a set of fulfilled promises?
> 
> We could use separate timers or wait on events/promises for notification
> response if we wanted to. We do need a timer somewhere as the data upload
> policy is implicitly accepted after 8 hours of no user input on the
> notification bar.

I'd be happier with events/promises (even if one of the promises hides a timeout for auto-accepting).

> 
> @@ +932,5 @@
> >        this._inProgressSubmissionRequest = null;
> >        this._handleSubmissionFailure();
> >      }.bind(this);
> >  
> > +// Nit: As usual, this could be made clearer and less spaghetti-prone with Task.jsm
> 
> This file was implemented before I knew about Task.jsm. If I were to write
> it today, it would likely use Task.jsm.

Fair enough.

> @@ +973,5 @@
> >        // the server. However, the frequency of delete requests across all
> >        // clients should be low, so this shouldn't pose a problem.
> > +// I'm trying to understand: if I understand this comment correctly, we are
> > +// trusting the clients on not DDoS-ing the server. Is that correct? If so,
> > +// there is something rotten here.
> 
> The comment errors on the side of paranoia. We go through great lengths to
> honor backoffs, etc everywhere in the code except for here, which ignores
> those things. In our opinion, the importance of deleting data outweighs a
> very minor increased risk of DDoS. The comment could be improved.

Yeah, I suspected that the problem was not in the algorithm but in a slightly ambiguous comment. As usual, please fix this once you find time.

> Also, since FHR is installed on every Firefox client and Firefox has a few
> hundred million instances, every time it talks with a remote server there is
> a potential for DDoS. Welcome to services at scale.

:)

> 
> ::: services/datareporting/sessions.jsm
> @@ +233,5 @@
> >      this._os.addObserver(this, "profile-before-change", false);
> >      this._os.addObserver(this, "user-interaction-active", false);
> >      this._os.addObserver(this, "user-interaction-inactive", false);
> > +// That looks elegant but quite expensive. Couldn't we just patch
> > +// nsEventStateManager to provide statistics?
> 
> Sure. If someone wants to code C++ to do this, we can switch FHR to use
> that. This was implemented in JS so we could meet deadlines.

I guess you have discussed this with Vlad or Taras by now, so I'm not going to pursue this topic.

> @@ +319,5 @@
> >    _deserialize: function (s) {
> >      let o;
> >      try {
> >        o = JSON.parse(s);
> > +// That object should pretty small, am I right?
> 
> About 80 bytes per entry.

Sounds good.
I'm going to call this done. Thanks folks!
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Component: Metrics and Firefox Health Report → Client: Desktop
Product: Mozilla Services → Firefox Health Report
Product: Firefox Health Report → Firefox Health Report Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: