Closed Bug 693667 Opened 8 years ago Closed 8 years ago

Track time from requesting an async query to completion via telemetry

Categories

(Toolkit :: Places, defect)

x86
Windows 7
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla10

People

(Reporter: taras.mozilla, Assigned: Yoric)

References

Details

(Whiteboard: [inbound])

Attachments

(2 files, 3 obsolete files)

bug 684513 seems to be running an awesomebar query during shutdown, we don't know how common this is.
Assignee: nobody → dteller
Here's a proposed patch. I do not have access to the Try Server yet and I do not see how to write an automated test for it.
Try run for 508e034b5582 is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=508e034b5582
Results (out of 18 total builds):
    exception: 16
    failure: 2
Builds available at http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/dteller@mozilla.com-508e034b5582
Attaching a candidate patch.
Attachment #566764 - Attachment is obsolete: true
Attachment #568612 - Flags: review?(tglek)
Comment on attachment 568612 [details] [diff] [review]
Telemetry probe for async query completion


>     if (rc == SQLITE_DONE)
>+    {
>+      finalizeStatistics();
>       return false;
>+    }

This is too fragile. Make a minimal RAII class and use destructor to finalize. 

> 
>     // If we got results, we can return now.
>     if (rc == SQLITE_ROW)
>+    {
>+      finalizeStatistics();
>       return true;
>+    }
> 
> 
>+void AsyncExecuteStatements::finalizeStatistics()
>+{
>+  PRTime finishDate = PR_Now();
>+  mozilla::Telemetry::Accumulate(mozilla::Telemetry::MOZ_STORAGE_ASYNC_REQUESTS_MS, (finishDate - mRequestStartDate)/1000);

Use Telemetry::AccumulateTimeDelta instead. Also you shouldn't need mozilla:: here.

>+  void finalizeStatistics();

This does not need to live in the class.
>   SQLiteMutex &mDBMutex;
>+  
>+  /**
>+   * The instant at which the request was started.
>+   *
>+   * Used by telemetry.
>+   */
>+  PRTime mRequestStartDate;

Use TimeStamp in C++ code.
>+HISTOGRAM(MOZ_STORAGE_ASYNC_REQUESTS_MS, 1, 32768, 50, LINEAR, "mozStorage async requests completion (ms)")

Should be exponential and use 10(or maybe 20) buckets. We don't need super-high precision here just a ballpark figure of how long we are waiting.  What's 32768?
Attachment #568612 - Flags: review?(tglek) → review-
(In reply to Taras Glek (:taras) from comment #4)
> Comment on attachment 568612 [details] [diff] [review] [diff] [details] [review]
> Telemetry probe for async query completion
> 
> 
> >     if (rc == SQLITE_DONE)
> >+    {
> >+      finalizeStatistics();
> >       return false;
> >+    }
> 
> This is too fragile. Make a minimal RAII class and use destructor to
> finalize. 

Ok. For the moment, I only attempted to track successful transactions, I will add RAII and track also those that fail.

> >+void AsyncExecuteStatements::finalizeStatistics()
> >+{
> >+  PRTime finishDate = PR_Now();
> >+  mozilla::Telemetry::Accumulate(mozilla::Telemetry::MOZ_STORAGE_ASYNC_REQUESTS_MS, (finishDate - mRequestStartDate)/1000);
> 
> Use Telemetry::AccumulateTimeDelta instead. Also you shouldn't need
> mozilla:: here.

Ok. I have used an older version of Telemetry which did not have AccumulateTimeDelta.

> 
> >+  void finalizeStatistics();
> 
> This does not need to live in the class.
> >   SQLiteMutex &mDBMutex;
> >+  
> >+  /**
> >+   * The instant at which the request was started.
> >+   *
> >+   * Used by telemetry.
> >+   */
> >+  PRTime mRequestStartDate;
> 
> Use TimeStamp in C++ code.

Ok.

> >+HISTOGRAM(MOZ_STORAGE_ASYNC_REQUESTS_MS, 1, 32768, 50, LINEAR, "mozStorage async requests completion (ms)")
> 
> Should be exponential and use 10(or maybe 20) buckets. We don't need
> super-high precision here just a ballpark figure of how long we are waiting.
> What's 32768?

Ok.

From what I understand of the source code of telemetry, that's an upper bound on the measure. Here, about 32 seconds.
> > 
> > This is too fragile. Make a minimal RAII class and use destructor to
> > finalize. 
> 
> Ok. For the moment, I only attempted to track successful transactions, I
> will add RAII and track also those that fail.

Sounds like we should supplement this with a binary histogram to track how many transactions fail.

> 
> From what I understand of the source code of telemetry, that's an upper
> bound on the measure. Here, about 32 seconds.

http://mxr.mozilla.org/mozilla-central/source//ipc/chromium/src/base/histogram.h#266 is the only max I can think of. Though if it turns out we have to measure numbers this high, we should just switch to seconds :)
(In reply to Taras Glek (:taras) from comment #6)
> > > 
> > > This is too fragile. Make a minimal RAII class and use destructor to
> > > finalize. 
> > 
> > Ok. For the moment, I only attempted to track successful transactions, I
> > will add RAII and track also those that fail.
> 
> Sounds like we should supplement this with a binary histogram to track how
> many transactions fail.

I'll do that.

> > 
> > From what I understand of the source code of telemetry, that's an upper
> > bound on the measure. Here, about 32 seconds.
> 
> http://mxr.mozilla.org/mozilla-central/source//ipc/chromium/src/base/
> histogram.h#266 is the only max I can think of. Though if it turns out we
> have to measure numbers this high, we should just switch to seconds :)

Well, I traced it to that one http://mxr.mozilla.org/mozilla-central/source/ipc/chromium/src/base/histogram.h#553 and then assumed that 32 seconds was a reasonable upper bound :) More generally, which upper bound do you have in mind?
Updated to take account tglek's remarks.
Attachment #568612 - Attachment is obsolete: true
Attachment #569065 - Flags: review?(tglek)
Comment on attachment 569065 [details] [diff] [review]
Telemetry probe for async query completion (v2)

>+FinallySendExecutionDuration::FinallySendExecutionDuration(TimeStamp aStart): mStart(aStart)
>+{

Sorry, I forgot to mention, we have http://mxr.mozilla.org/mozilla-central/source/toolkit/components/telemetry/Telemetry.h#84 Telemetry::AutoTimer for this.
Can you just add a second constructor that takes aStart parameter?



You should just do Timetamp::Now() here instead of passing it as an arg.

>   bool statementsNeedTransaction();
>-
>+  

Watch your whitespace.


Rest is of the patch is ok
Attachment #569065 - Flags: review?(tglek) → review-
Fixes applied.
Attachment #569065 - Attachment is obsolete: true
Attachment #569315 - Flags: review?(tglek)
Attachment #569315 - Flags: review?(tglek) → review+
Whiteboard: inb
https://hg.mozilla.org/mozilla-central/rev/8dc09fe4d1b9
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla10
Comment on attachment 569315 [details] [diff] [review]
Telemetry probe for async query completion (v3)

>     if (rc == SQLITE_DONE)
>+    {

{ should not be on a separate line. Please keep that in mind.
fyi, some data from these
(MOZ_STORAGE_ASYNC_REQUESTS_MS,0)	9.9341854E7
(MOZ_STORAGE_ASYNC_REQUESTS_MS,10560)	34828.0
(MOZ_STORAGE_ASYNC_REQUESTS_MS,1097)	246305.0
(MOZ_STORAGE_ASYNC_REQUESTS_MS,114)	1367235.0
(MOZ_STORAGE_ASYNC_REQUESTS_MS,1)	1.6120877E7
(MOZ_STORAGE_ASYNC_REQUESTS_MS,12)	5147187.0
(MOZ_STORAGE_ASYNC_REQUESTS_MS,18602)	8294.0
(MOZ_STORAGE_ASYNC_REQUESTS_MS,1932)	171471.0
(MOZ_STORAGE_ASYNC_REQUESTS_MS,201)	966396.0
(MOZ_STORAGE_ASYNC_REQUESTS_MS,21)	4144679.0
(MOZ_STORAGE_ASYNC_REQUESTS_MS,2)	6599493.0
(MOZ_STORAGE_ASYNC_REQUESTS_MS,32768)	30348.0
(MOZ_STORAGE_ASYNC_REQUESTS_MS,3403)	90193.0
(MOZ_STORAGE_ASYNC_REQUESTS_MS,354)	609614.0
(MOZ_STORAGE_ASYNC_REQUESTS_MS,37)	3472627.0
(MOZ_STORAGE_ASYNC_REQUESTS_MS,4)	4493102.0
(MOZ_STORAGE_ASYNC_REQUESTS_MS,5995)	55101.0
(MOZ_STORAGE_ASYNC_REQUESTS_MS,623)	358374.0
(MOZ_STORAGE_ASYNC_REQUESTS_MS,65)	2217490.0
(MOZ_STORAGE_ASYNC_REQUESTS_MS,7)	3953052.0
(MOZ_STORAGE_ASYNC_REQUESTS_SUCCESS,0)	8991.0
(MOZ_STORAGE_ASYNC_REQUESTS_SUCCESS,1)	1.49419566E8
(MOZ_STORAGE_ASYNC_REQUESTS_SUCCESS,2)	0.0

It's not looking good
How should I read this?
(In reply to David Rajchenbach Teller [:Yoric] from comment #15)
> How should I read this?

(name, bucket) entries

Means a lot of people are waiting >32seconds for places to do stuff
Note these are only pings from nightlies from Oct 30th.
(In reply to Taras Glek (:taras) from comment #16)
> (In reply to David Rajchenbach Teller [:Yoric] from comment #15)
> > How should I read this?
> 
> (name, bucket) entries
> 
> Means a lot of people are waiting >32seconds for places to do stuff

This is for any Storage users, included add-ons and Sync, right?
Remember this also includes VACUUM, and some other queries that are slow by design and are indeed executed on idle, nothing that should take that much time, btw (apart vacuum that may last multiseconds, obviously). Results seem puzzling, we need some way to report back SQL queries that take much time, anything above 100ms that does not happen on idle is unexpected and very likely fixable.
fwiw, on my local about:telemetry I have 1 request taking 1s, any other request is below 600ms, vast majority is below 100ms, avg is 1.4ms
Blocks: 699051
(In reply to Marco Bonardo [:mak] from comment #19)
> fwiw, on my local about:telemetry I have 1 request taking 1s, any other
> request is below 600ms, vast majority is below 100ms, avg is 1.4ms

Note i'm on an ssd(windows), i have queries taking 3seconds in about:telemetry. This + mutexes could be very fun even if on background thread.
Absolutely, that's why we need a way to figure out which queries are these and why you see higher times than me (I'm on a common sata hard drive), a simple initial approach may be bug 694015, at least we could get a log in debug builds.
(In reply to Marco Bonardo [:mak] from comment #21)
> Absolutely, that's why we need a way to figure out which queries are these
> and why you see higher times than me (I'm on a common sata hard drive), a
> simple initial approach may be bug 694015, at least we could get a log in
> debug builds.

I filed bug 699051 for it, should move discussion there. Do you have a patch for 694015 we can base 699051 or should I DUP the bug I filed
Duplicate of this bug: 694015
Comment on attachment 569315 [details] [diff] [review]
Telemetry probe for async query completion (v3)

Review of attachment 569315 [details] [diff] [review]:
-----------------------------------------------------------------

I really don't appreciate code being changed in my component (storage) filed in the wrong bugzilla component (places) so I have no chance of seeing it.  At least cc me, or send a review request my way.

The only reason I found it is because mak filed bug 694015 (in the right component) and it got duped here.  Please fix my review comments.

::: storage/src/mozStorageAsyncStatementExecution.cpp
@@ +236,5 @@
>  , mState(PENDING)
>  , mCancelRequested(false)
>  , mMutex(aConnection->sharedAsyncExecutionMutex)
>  , mDBMutex(aConnection->sharedDBMutex)
> +  , mRequestStartDate(TimeStamp::Now())

Please keep the style consistent.

@@ +367,5 @@
>  
>      int rc = stepStmt(aStatement);
>      // Stop if we have no more results.
>      if (rc == SQLITE_DONE)
> +    {

Bracing style is inconsistent with both Mozilla style and storage style guides.

@@ +375,4 @@
>  
>      // If we got results, we can return now.
>      if (rc == SQLITE_ROW)
> +    {

ditto

::: storage/src/mozStorageAsyncStatementExecution.h
@@ +259,5 @@
> +   * The instant at which the request was started.
> +   *
> +   * Used by telemetry.
> +   */
> +  TimeStamp mRequestStartDate;

This is a time, not a date, and the variable name should reflect that.
(In reply to Shawn Wilsher :sdwilsh from comment #24)
> Comment on attachment 569315 [details] [diff] [review] [diff] [details] [review]
> Telemetry probe for async query completion (v3)
> 
> Review of attachment 569315 [details] [diff] [review] [diff] [details] [review]:
> -----------------------------------------------------------------
> 
> I really don't appreciate code being changed in my component (storage) filed
> in the wrong bugzilla component (places) so I have no chance of seeing it. 
> At least cc me, or send a review request my way.

Please accept my apologies, I must have gotten that wrong.

> ::: storage/src/mozStorageAsyncStatementExecution.cpp
> @@ +236,5 @@
> >  , mState(PENDING)
> >  , mCancelRequested(false)
> >  , mMutex(aConnection->sharedAsyncExecutionMutex)
> >  , mDBMutex(aConnection->sharedDBMutex)
> > +  , mRequestStartDate(TimeStamp::Now())
> 
> Please keep the style consistent.

I don't quite understand. What exactly should I have done?

> 
> @@ +367,5 @@
> >  
> >      int rc = stepStmt(aStatement);
> >      // Stop if we have no more results.
> >      if (rc == SQLITE_DONE)
> > +    {
> 
> Bracing style is inconsistent with both Mozilla style and storage style
> guides.

Ok, my bad.

> 
> @@ +375,4 @@
> >  
> >      // If we got results, we can return now.
> >      if (rc == SQLITE_ROW)
> > +    {
> 
> ditto
> 
> ::: storage/src/mozStorageAsyncStatementExecution.h
> @@ +259,5 @@
> > +   * The instant at which the request was started.
> > +   *
> > +   * Used by telemetry.
> > +   */
> > +  TimeStamp mRequestStartDate;
> 
> This is a time, not a date, and the variable name should reflect that.

Is "time" unambiguous in English? I tend to favor "date" because it definitely represents an instant, rather than a duration. However, I can certainly rename the field "mRequestStartTimeStamp".

Thanks for the additional review.
Same patch, with the changes suggested above.
Attachment #572454 - Flags: review?(sdwilsh)
Bah.  Somehow I didn't get cc'd on this when I commented, hence the missed review request...
Comment on attachment 572454 [details] [diff] [review]
Telemetry probe for async query completion (v4)

Review of attachment 572454 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for taking care of this.  r=sdwilsh

::: storage/src/mozStorageAsyncStatementExecution.cpp
@@ +16,5 @@
>   * The Original Code is mozilla.org code.
>   *
>   * The Initial Developer of the Original Code is
>   * the Mozilla Foundation.
> + * Portions created by the Initial Developer are Copyright (C) 2008-2011

FWIW, I don't think we need to actually update that.
Attachment #572454 - Flags: review?(sdwilsh) → review+
You need to log in before you can comment on or make changes to this bug.