Last Comment Bug 770264 - HTTP Connection Diagnostics
: HTTP Connection Diagnostics
Status: RESOLVED FIXED
[spdy]
:
Product: Core
Classification: Components
Component: Networking: HTTP (show other bugs)
: 16 Branch
: x86_64 Linux
: -- enhancement (vote)
: mozilla16
Assigned To: Patrick McManus [:mcmanus] PTO until Sep 6
:
Mentors:
Depends on: 775508
Blocks: 762162
  Show dependency treegraph
 
Reported: 2012-07-02 11:09 PDT by Patrick McManus [:mcmanus] PTO until Sep 6
Modified: 2012-07-27 09:57 PDT (History)
6 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
verified


Attachments
sample report (52.45 KB, text/plain)
2012-07-02 11:32 PDT, Patrick McManus [:mcmanus] PTO until Sep 6
no flags Details
patch 0 (18.03 KB, patch)
2012-07-02 11:45 PDT, Patrick McManus [:mcmanus] PTO until Sep 6
honzab.moz: review+
akeybl: approval‑mozilla‑aurora+
akeybl: approval‑mozilla‑beta-
Details | Diff | Splinter Review
aurora version (17.98 KB, patch)
2012-07-02 11:45 PDT, Patrick McManus [:mcmanus] PTO until Sep 6
no flags Details | Diff | Splinter Review
beta version (16.29 KB, patch)
2012-07-02 11:46 PDT, Patrick McManus [:mcmanus] PTO until Sep 6
no flags Details | Diff | Splinter Review

Description Patrick McManus [:mcmanus] PTO until Sep 6 2012-07-02 11:09:48 PDT
bugs like 762162 demand a HTTP diagnostic other than the NSPR log. The log can only be made by restarting firefox (or by a developer with really keen debugger skills) which can have the side effect of clearing the error condition.. furthermore the error may take a long time to trigger and in those cases the size of the speed-cost of the log may make its use prohibitive.

765849 proposes a about:net information page - which is a good idea. There is a summer of code project looking at combining that with some simple diagnostics. Something like that should be the end goal.

But I need some of this info now to solve the other bugs.

This bug proposes (and implements) a quick and dirty data dump that can be used in the interim to satisfy the immediate need for some of this information.

The mecahnism it uses is crude (a dump to javascript console triggered by a pref change), but it should be extremely resistant to regression problems and can therefore be safely backported. The data will need to be cut and pasted and post processed, but the key pieces should be there. No changes to existing code paths or data structures are made.
Comment 1 Patrick McManus [:mcmanus] PTO until Sep 6 2012-07-02 11:32:37 PDT
Created attachment 638426 [details]
sample report
Comment 2 Patrick McManus [:mcmanus] PTO until Sep 6 2012-07-02 11:45:10 PDT
Created attachment 638432 [details] [diff] [review]
patch 0

I'd love to get this reviewed as quick as possible to increase its chances of being accepted for backport.

I acknowledge its crude, but my goal here was to try and avoid adding any code that wasn't basically just a printf of an existing data structure. Other than the prototypes the code even manages to basically live in a new file.

To try it out, just open the javascript console - make sure it is showing the warning level and then flip network.http.diagnostics to true. (you can get a new report by going back to false and then true again - it prints each time it transitions to true.)
Comment 3 Patrick McManus [:mcmanus] PTO until Sep 6 2012-07-02 11:45:52 PDT
Created attachment 638433 [details] [diff] [review]
aurora version

an aurora (ff15) version
Comment 4 Patrick McManus [:mcmanus] PTO until Sep 6 2012-07-02 11:46:21 PDT
Created attachment 638434 [details] [diff] [review]
beta version

a beta (ff14) version
Comment 5 Patrick McManus [:mcmanus] PTO until Sep 6 2012-07-02 11:47:15 PDT
builds will be here https://tbpl.mozilla.org/?tree=Try&rev=83c747dab57b
Comment 6 Honza Bambas (:mayhemer) 2012-07-02 14:57:37 PDT
Comment on attachment 638432 [details] [diff] [review]
patch 0

Review of attachment 638432 [details] [diff] [review]:
-----------------------------------------------------------------

r=honzab with few nits.

::: netwerk/protocol/http/ConnectionDiagnostics.cpp
@@ +58,5 @@
> +                              self->AtActiveConnectionLimit(ent, NS_HTTP_ALLOW_KEEPALIVE));
> +  self->mLogData.AppendPrintf("   RestrictConnections = %d\n",
> +                              self->RestrictConnections(ent));
> +  self->mLogData.AppendPrintf("   Pending Q Length = %d\n",
> +                              ent->mPendingQ.Length());

It should be %u, nsTArray::Length is unsigned.

@@ +72,5 @@
> +                              ent->mUsingSpdy, ent->mTestedSpdy, ent->mSpdyPreferred);
> +  self->mLogData.AppendPrintf("   pipelinestate = %d penalty = %d\n",
> +                              ent->mPipelineState, ent->mPipeliningPenalty);
> +  for (i = 0; i < nsAHttpTransaction::CLASS_MAX; ++i) {
> +    self->mLogData.AppendPrintf("   pipeline per class penalty %u %d\n",

Be consistent with how you log the class (classification).  On other place you are logging it as %x.  Choose only one.

@@ +76,5 @@
> +    self->mLogData.AppendPrintf("   pipeline per class penalty %u %d\n",
> +                                i, ent->mPipeliningClassPenalty[i]);
> +  }
> +  for (i = 0; i < ent->mActiveConns.Length(); ++i) {
> +    self->mLogData.AppendPrintf("   :: Active Connection #%d\n", i);

#%u ?

@@ +88,5 @@
> +    self->mLogData.AppendPrintf("   :: Half Open #%d\n", i);
> +    ent->mHalfOpens[i]->PrintDiagnostics(self->mLogData);
> +  }
> +    
> +  return PL_DHASH_NEXT;

Log also pending transactions (request path, classification, caps, restart count?)

@@ +103,5 @@
> +  if (mPrimarySynStarted.IsNull())
> +    log.AppendPrintf("    primary not started\n");
> +  else
> +    log.AppendPrintf("    primary started %ldms ago\n",
> +                     (mPrimarySynStarted - now).ToMilliseconds());

This must be logged as e.g. %.2fms - ToMilliseconds() returns double.

@@ +112,5 @@
> +    log.AppendPrintf("    backup started %ldms ago\n",
> +                     (mBackupSynStarted - now).ToMilliseconds());
> +    
> +  log.AppendPrintf("    primary transport %p, backup transport %p\n",
> +                   mSocketTransport.get(), mBackupTransport.get());

I'd rather just expose the information about whether the transport is present or not (i.e. %d with !!m*Transport, or so).  As I understand, the log doesn't have a different purpose.

@@ +130,5 @@
> +  log.AppendPrintf("    iskeepalive = %d  dontReuse = %d isReused = %d\n",
> +                   IsKeepAlive(), mDontReuse, mIsReused);
> +
> +  log.AppendPrintf("    mTransaction = %p mSpdySession = %p\n",
> +                   mTransaction.get(), mSpdySession.get());

As well here.

@@ +141,5 @@
> +                   mMaxBytesRead, mTotalBytesRead, mTotalBytesWritten);
> +    
> +  log.AppendPrintf("    rtt = %ums\n", PR_IntervalToMilliseconds(mRtt));
> +
> +  log.AppendPrintf("    idlemonitoring = %d transactionCount=%d\n",

transactionCount=%u

@@ +178,5 @@
> +  log.AppendPrintf("     Ping Timeout = %ums\n",
> +                   PR_IntervalToMilliseconds(gHttpHandler->SpdyPingTimeout()));
> +  log.AppendPrintf("     Idle for Any Activity (ping) = %ums\n",
> +                   PR_IntervalToMilliseconds(now - mLastReadEpoch));    
> +  log.AppendPrintf("     Idle for Data Activity = %umsn",

%ums\n ?

@@ +215,5 @@
> +  log.AppendPrintf("     Ping Timeout = %ums\n",
> +                   PR_IntervalToMilliseconds(gHttpHandler->SpdyPingTimeout()));
> +  log.AppendPrintf("     Idle for Any Activity (ping) = %ums\n",
> +                   PR_IntervalToMilliseconds(now - mLastReadEpoch));    
> +  log.AppendPrintf("     Idle for Data Activity = %umsn",

%ums\n

::: netwerk/protocol/http/nsHttpConnectionMgr.h
@@ +395,5 @@
>          void SetSpeculative(bool val) { mSpeculative = val; }
>  
>          bool HasConnected() { return mHasConnected; }
>  
> +        void PrintDiagnostics (nsCString &log);

Space between PrintDiagnostics and (

@@ +607,5 @@
>                                               nsAutoPtr<nsConnectionEntry> &ent,
>                                               void *closure);
> +
> +    // For diagnostics
> +    void OnMsgPrintDiagnostics     (PRInt32, void *);

Spaces.

@@ +608,5 @@
>                                               void *closure);
> +
> +    // For diagnostics
> +    void OnMsgPrintDiagnostics     (PRInt32, void *);
> +    static PLDHashOperator PrintDiagnosticsCB (const nsACString &key,

Space.

::: netwerk/protocol/http/nsHttpHandler.cpp
@@ +1148,5 @@
> +    if (PREF_CHANGED(HTTP_PREF("diagnostics"))) {
> +        rv = prefs->GetBoolPref(HTTP_PREF("diagnostics"), &cVar);
> +        if (NS_SUCCEEDED(rv) && cVar) {
> +            if (mConnMgr)
> +                mConnMgr->PrintDiagnostics();

You should ignore this when called to load all prefs (with pref == null).  Otherwise it will print out on startup (may cause unexpected trouble and is not necessary ; also people could mistakenly send the first report...).
Comment 7 Patrick McManus [:mcmanus] PTO until Sep 6 2012-07-09 15:10:26 PDT
https://hg.mozilla.org/integration/mozilla-inbound/rev/d9351d6dd5e3
Comment 8 Patrick McManus [:mcmanus] PTO until Sep 6 2012-07-09 17:10:49 PDT
Comment on attachment 638432 [details] [diff] [review]
patch 0

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 

This is a diagnostic only feature meant to help solve HTTP connection bugs - specifically spdy bugs 762126 762025. 

It was designed to be as safe to backport as possible. The diagnostic mechanism is dump to the JS error console, which obviates the previous need to have started firefox with NSPR logging. When the problem is witnessed a diagnostic can be produced by changing a about:config pref at that time and the information copied out of the javascript console.

User impact if declined: 

continued lack of insight into the aforementioned bugs.

Testing completed (on m-c, etc.):  on m-c

Risk to taking this patch (and alternatives if risky): 

designed to be very low. Unless the pref (network.http.diagnositics) is flipped to a non-default value no new code is executed.

String or UUID changes made by this patch: 

the diagnostic strings are not localized (they are generally code level diagnostic names), no uuid changes are made.
Comment 9 Alex Keybl [:akeybl] 2012-07-09 18:07:53 PDT
Comment on attachment 638432 [details] [diff] [review]
patch 0

[Triage Comment]
It's too late in the cycle to take a forward change of this nature for FF14 (our final beta build is about to be kicked off). FF15 will be on beta in about a week, so we can pick up feedback from that channel then. Approved for Aurora 15.
Comment 10 Patrick McManus [:mcmanus] PTO until Sep 6 2012-07-10 05:38:19 PDT
https://hg.mozilla.org/releases/mozilla-aurora/rev/ccfb28dbae8b
Comment 11 Ryan VanderMeulen [:RyanVM] 2012-07-10 15:49:19 PDT
Target milestone reflects when it was fixed on trunk. Use the status flags for Aurora/Beta.

https://hg.mozilla.org/mozilla-central/rev/d9351d6dd5e3
Comment 12 Ioana (away) 2012-07-27 09:22:02 PDT
Mozilla/5.0 (X11; Linux x86_64; rv:15.0) Gecko/20100101 Firefox/15.0 (20120724191344)

I tried to verify this fix by the guidelines in comment 2, but nothing was displayed in the console when flipping network.http.diagnostics.

I tried with the Web Console, Error Console and Firebug. Made sure warning level was showing but I still didn't get anything displayed.

Patrick, which console are you referring to by "javascript console"? Should I change any other prefs, or perform any other actions to see the logged information?
Comment 13 Patrick McManus [:mcmanus] PTO until Sep 6 2012-07-27 09:42:43 PDT
(In reply to Ioana Budnar [QA] from comment #12)
> Mozilla/5.0 (X11; Linux x86_64; rv:15.0) Gecko/20100101 Firefox/15.0
> (20120724191344)
> 
>> 
> Patrick, which console are you referring to by "javascript console"? 

error console - ctrl shift j

Should
> I change any other prefs, or perform any other actions to see the logged
> information?

flip the pref from false to true. (going the other way has no impact)

I just reconfirmed it works fine on both linux and windows. I've even used it to fix a bug in something else.
Comment 14 Ioana (away) 2012-07-27 09:57:41 PDT
Mozilla/5.0 (X11; Linux x86_64; rv:15.0) Gecko/20100101 Firefox/15.0 (20120724191344)

I verified this again, on a fresh profile, by checking the Error Console (Messages section, not Warnings) and it seems to work ok. 

It still doesn't work on my old profile, but that is a pretty dirty one. I disabled all my 16 addons and it still reproduces. When I flip the pref to true I usually get this in the Error Console: 

Timestamp: 07/27/2012 07:55:26 PM
Warning: ReferenceError: reference to undefined property b.view
Source File: chrome://global/content/bindings/tree.xml
Line: 1038

I will investigate this and file a new issue if I can find any way of reproducing it a second time.

Note You need to log in before you can comment on or make changes to this bug.