Last Comment Bug 670008 - Telemetry occasionally reports negative startup numbers(main, firstPaint,sessionRestored)
: Telemetry occasionally reports negative startup numbers(main, firstPaint,sess...
Status: RESOLVED FIXED
:
Product: Toolkit
Classification: Components
Component: Telemetry (show other bugs)
: unspecified
: x86 Linux
: -- normal (vote)
: mozilla8
Assigned To: (dormant account)
:
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-07-07 14:33 PDT by (dormant account)
Modified: 2011-07-15 06:49 PDT (History)
2 users (show)
bzbarsky: in‑testsuite-
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
more error checking (2.29 KB, patch)
2011-07-07 16:51 PDT, (dormant account)
no flags Details | Diff | Splinter Review
more error checking (2.80 KB, patch)
2011-07-07 16:57 PDT, (dormant account)
mh+mozilla: review-
Details | Diff | Splinter Review
v2 (5.01 KB, patch)
2011-07-11 16:06 PDT, (dormant account)
mh+mozilla: review+
Details | Diff | Splinter Review

Description (dormant account) 2011-07-07 14:33:14 PDT
This isn't consistent, sometimes just one number is negative. Happens on all 3 platforms, is most likely on Linux
Comment 1 Mike Hommey [:glandium] 2011-07-07 14:35:23 PDT
In all likeliness, the process startup time is the one that is wrong.
Comment 2 (dormant account) 2011-07-07 14:39:10 PDT
(In reply to comment #1)
> In all likeliness, the process startup time is the one that is wrong.

I think there are probably multiple bugs here.
Comment 3 Mike Hommey [:glandium] 2011-07-07 14:43:51 PDT
If I recall correctly, all timings for main, firstPaint and sessionRestore are taken the same way. It seems unlikely to be any problem with that, except for ntp adjustements, but even that is unlikely. Only process startup time is taken with a different method. There could be a discrepancy in the clocks/timers being involved system wise, or a problem with the way we handle the process startup time when restarting.
Comment 4 (dormant account) 2011-07-07 16:51:05 PDT
Created attachment 544656 [details] [diff] [review]
more error checking
Comment 5 (dormant account) 2011-07-07 16:57:18 PDT
Created attachment 544660 [details] [diff] [review]
more error checking

This makes negative values impossible.
Comment 6 Mike Hommey [:glandium] 2011-07-07 22:19:40 PDT
Comment on attachment 544660 [details] [diff] [review]
more error checking

Review of attachment 544660 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/startup/nsAppStartup.cpp
@@ +605,1 @@
>    gProcessCreationTimestamp = now - interval;

It's probably time to use the right APIs for that. I'll take a look.

@@ +737,5 @@
> +  if (gFirstPaintTimestamp >= gXRE_mainTimestamp)
> +    MaybeDefineProperty(cx, obj, "firstPaint", gFirstPaintTimestamp);
> +
> +  if (gRestoredTimestamp >= gXRE_mainTimestamp)
> +    MaybeDefineProperty(cx, obj, "sessionRestored", gRestoredTimestamp);

I don't think it's wise to actually hide the problem. We should try to fix it and observe the data we get in return.

::: toolkit/components/telemetry/TelemetryPing.js
@@ +179,4 @@
>      // uptime in minutes
>      uptime: Math.round((new Date() - si.process) / 60000)
>    }
> +  if (process) {

si.process
Comment 7 Mike Hommey [:glandium] 2011-07-07 22:28:59 PDT
Great, the taststats API doesn't provide the process startup time at high resolution :(
  	/* The time when a task begins, in [secs] since 1970. */
	__u32	ac_btime;		/* Begin time [sec since 1970] */

There's an elapsed time field in µs, though, so PR_Now() - that might work.
Comment 8 (dormant account) 2011-07-08 10:46:53 PDT
(In reply to comment #7)
> Great, the taststats API doesn't provide the process startup time at high
> resolution :(
>   	/* The time when a task begins, in [secs] since 1970. */
> 	__u32	ac_btime;		/* Begin time [sec since 1970] */
> 
> There's an elapsed time field in µs, though, so PR_Now() - that might work.

We can do taskstats switch in another patch, for now lets get this thing to not report insane numbers.

I agree that it is beneficial to be able to spot when wrong numbers are generated, but we can use telemetry for that. I would rather sanitize the numbers on the clientside than on the server
Comment 9 Mike Hommey [:glandium] 2011-07-08 10:53:12 PDT
(In reply to comment #8)
> I agree that it is beneficial to be able to spot when wrong numbers are
> generated, but we can use telemetry for that.

Can we?
Comment 10 (dormant account) 2011-07-11 16:06:47 PDT
Created attachment 545278 [details] [diff] [review]
v2

This adds telemetry probes, addresses comments
Comment 11 Mike Hommey [:glandium] 2011-07-12 00:39:14 PDT
Comment on attachment 545278 [details] [diff] [review]
v2

Review of attachment 545278 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/telemetry/TelemetryHistograms.h
@@ +102,4 @@
>  #undef HTTP_HISTOGRAMS
>  HISTOGRAM(FIND_PLUGINS, 1, 3000, 10, EXPONENTIAL, "Time spent scanning filesystem for plugins (ms)")
>  HISTOGRAM(CHECK_JAVA_ENABLED, 1, 3000, 10, EXPONENTIAL, "Time spent checking if Java is enabled (ms)")
> +HISTOGRAM(STARTUP_MEASUREMENT_ERRORS, 1, 3, 4, LINEAR, "Flags errors in startup calculation()")

I think that needs to be 1, 3, 5 or 1, 2, 4, but I could be wrong.
Comment 12 (dormant account) 2011-07-12 18:38:09 PDT
(In reply to comment #11)
> Comment on attachment 545278 [details] [diff] [review] [review]
> v2
> 
> Review of attachment 545278 [details] [diff] [review] [review]:
> -----------------------------------------------------------------
> 
> ::: toolkit/components/telemetry/TelemetryHistograms.h
> @@ +102,4 @@
> >  #undef HTTP_HISTOGRAMS
> >  HISTOGRAM(FIND_PLUGINS, 1, 3000, 10, EXPONENTIAL, "Time spent scanning filesystem for plugins (ms)")
> >  HISTOGRAM(CHECK_JAVA_ENABLED, 1, 3000, 10, EXPONENTIAL, "Time spent checking if Java is enabled (ms)")
> > +HISTOGRAM(STARTUP_MEASUREMENT_ERRORS, 1, 3, 4, LINEAR, "Flags errors in startup calculation()")
> 
> I think that needs to be 1, 3, 5 or 1, 2, 4, but I could be wrong.

What I have in the code is correct.  It creates 4 buckets 0-3.
Comment 13 Boris Zbarsky [:bz] 2011-07-14 19:18:32 PDT
Please, please ask for checkin-needed on patches with the right commit message?

http://hg.mozilla.org/integration/mozilla-inbound/rev/caf2dc068c23

Note You need to log in before you can comment on or make changes to this bug.