Last Comment Bug 670008 - Telemetry occasionally reports negative startup numbers(main, firstPaint,sessionRestored)
: Telemetry occasionally reports negative startup numbers(main, firstPaint,sess...
Status: RESOLVED FIXED
:
Product: Toolkit
Classification: Components
Component: Telemetry (show other bugs)
: unspecified
: x86 Linux
: -- normal (vote)
: mozilla8
Assigned To: (dormant account)
:
: Georg Fritzsche [:gfritzsche] [away Jan 14 - 24]
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-07-07 14:33 PDT by (dormant account)
Modified: 2011-07-15 06:49 PDT (History)
2 users (show)
bzbarsky: in‑testsuite-
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
more error checking (2.29 KB, patch)
2011-07-07 16:51 PDT, (dormant account)
no flags Details | Diff | Splinter Review
more error checking (2.80 KB, patch)
2011-07-07 16:57 PDT, (dormant account)
mh+mozilla: review-
Details | Diff | Splinter Review
v2 (5.01 KB, patch)
2011-07-11 16:06 PDT, (dormant account)
mh+mozilla: review+
Details | Diff | Splinter Review

Description User image (dormant account) 2011-07-07 14:33:14 PDT
This isn't consistent, sometimes just one number is negative. Happens on all 3 platforms, is most likely on Linux
Comment 1 User image Mike Hommey [:glandium] 2011-07-07 14:35:23 PDT
In all likeliness, the process startup time is the one that is wrong.
Comment 2 User image (dormant account) 2011-07-07 14:39:10 PDT
(In reply to comment #1)
> In all likeliness, the process startup time is the one that is wrong.

I think there are probably multiple bugs here.
Comment 3 User image Mike Hommey [:glandium] 2011-07-07 14:43:51 PDT
If I recall correctly, all timings for main, firstPaint and sessionRestore are taken the same way. It seems unlikely to be any problem with that, except for ntp adjustements, but even that is unlikely. Only process startup time is taken with a different method. There could be a discrepancy in the clocks/timers being involved system wise, or a problem with the way we handle the process startup time when restarting.
Comment 4 User image (dormant account) 2011-07-07 16:51:05 PDT
Created attachment 544656 [details] [diff] [review]
more error checking
Comment 5 User image (dormant account) 2011-07-07 16:57:18 PDT
Created attachment 544660 [details] [diff] [review]
more error checking

This makes negative values impossible.
Comment 6 User image Mike Hommey [:glandium] 2011-07-07 22:19:40 PDT
Comment on attachment 544660 [details] [diff] [review]
more error checking

Review of attachment 544660 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/startup/nsAppStartup.cpp
@@ +605,1 @@
>    gProcessCreationTimestamp = now - interval;

It's probably time to use the right APIs for that. I'll take a look.

@@ +737,5 @@
> +  if (gFirstPaintTimestamp >= gXRE_mainTimestamp)
> +    MaybeDefineProperty(cx, obj, "firstPaint", gFirstPaintTimestamp);
> +
> +  if (gRestoredTimestamp >= gXRE_mainTimestamp)
> +    MaybeDefineProperty(cx, obj, "sessionRestored", gRestoredTimestamp);

I don't think it's wise to actually hide the problem. We should try to fix it and observe the data we get in return.

::: toolkit/components/telemetry/TelemetryPing.js
@@ +179,4 @@
>      // uptime in minutes
>      uptime: Math.round((new Date() - si.process) / 60000)
>    }
> +  if (process) {

si.process
Comment 7 User image Mike Hommey [:glandium] 2011-07-07 22:28:59 PDT
Great, the taststats API doesn't provide the process startup time at high resolution :(
  	/* The time when a task begins, in [secs] since 1970. */
	__u32	ac_btime;		/* Begin time [sec since 1970] */

There's an elapsed time field in µs, though, so PR_Now() - that might work.
Comment 8 User image (dormant account) 2011-07-08 10:46:53 PDT
(In reply to comment #7)
> Great, the taststats API doesn't provide the process startup time at high
> resolution :(
>   	/* The time when a task begins, in [secs] since 1970. */
> 	__u32	ac_btime;		/* Begin time [sec since 1970] */
> 
> There's an elapsed time field in µs, though, so PR_Now() - that might work.

We can do taskstats switch in another patch, for now lets get this thing to not report insane numbers.

I agree that it is beneficial to be able to spot when wrong numbers are generated, but we can use telemetry for that. I would rather sanitize the numbers on the clientside than on the server
Comment 9 User image Mike Hommey [:glandium] 2011-07-08 10:53:12 PDT
(In reply to comment #8)
> I agree that it is beneficial to be able to spot when wrong numbers are
> generated, but we can use telemetry for that.

Can we?
Comment 10 User image (dormant account) 2011-07-11 16:06:47 PDT
Created attachment 545278 [details] [diff] [review]
v2

This adds telemetry probes, addresses comments
Comment 11 User image Mike Hommey [:glandium] 2011-07-12 00:39:14 PDT
Comment on attachment 545278 [details] [diff] [review]
v2

Review of attachment 545278 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/telemetry/TelemetryHistograms.h
@@ +102,4 @@
>  #undef HTTP_HISTOGRAMS
>  HISTOGRAM(FIND_PLUGINS, 1, 3000, 10, EXPONENTIAL, "Time spent scanning filesystem for plugins (ms)")
>  HISTOGRAM(CHECK_JAVA_ENABLED, 1, 3000, 10, EXPONENTIAL, "Time spent checking if Java is enabled (ms)")
> +HISTOGRAM(STARTUP_MEASUREMENT_ERRORS, 1, 3, 4, LINEAR, "Flags errors in startup calculation()")

I think that needs to be 1, 3, 5 or 1, 2, 4, but I could be wrong.
Comment 12 User image (dormant account) 2011-07-12 18:38:09 PDT
(In reply to comment #11)
> Comment on attachment 545278 [details] [diff] [review] [review]
> v2
> 
> Review of attachment 545278 [details] [diff] [review] [review]:
> -----------------------------------------------------------------
> 
> ::: toolkit/components/telemetry/TelemetryHistograms.h
> @@ +102,4 @@
> >  #undef HTTP_HISTOGRAMS
> >  HISTOGRAM(FIND_PLUGINS, 1, 3000, 10, EXPONENTIAL, "Time spent scanning filesystem for plugins (ms)")
> >  HISTOGRAM(CHECK_JAVA_ENABLED, 1, 3000, 10, EXPONENTIAL, "Time spent checking if Java is enabled (ms)")
> > +HISTOGRAM(STARTUP_MEASUREMENT_ERRORS, 1, 3, 4, LINEAR, "Flags errors in startup calculation()")
> 
> I think that needs to be 1, 3, 5 or 1, 2, 4, but I could be wrong.

What I have in the code is correct.  It creates 4 buckets 0-3.
Comment 13 User image Boris Zbarsky [:bz] (still a bit busy) 2011-07-14 19:18:32 PDT
Please, please ask for checkin-needed on patches with the right commit message?

http://hg.mozilla.org/integration/mozilla-inbound/rev/caf2dc068c23

Note You need to log in before you can comment on or make changes to this bug.