650143 - Necko telemetry: make list of all stats we want to track

Reporter

Description

•

13 years ago

So we need to figure out what we want to track for necko telemetry. Once we've got that list we also need to figure out 1) any privacy issues; 2) whether we already collect the data in necko, or need to start collecting it; and 3) making sure we expose JS APIs to access the stats so the telemetry code can collect them. We can break implementation out into separate bugs--let's keep this one focused on what we want to collect (and any privacy issues).

The best list I've seen so far is Patrick's, below. We probably also want to collect aggregate stats from the web timing spec (see bug 576006).

----
a (perhaps partial) wishlist:

distribution of rtts are a great way to get started - crosstabbed
against hardware type as best we can (wired, 802.11, non-802.11
wireless)... probably done as handshake time..

A series of timestamped events containing major UI events (such as tab
switching) along with necko requests and the eventual sizes and types of
those responses along with when chunks of them were consumed by the
DOM... that's really interesting from a scheduling perspective - I've
always thought there might be some interesting cases where we have all N
parallel connections active and a request for something really high
priority (e.g. css for the viewable tab) comes in.. it might well be
worth pausing 1 or more of the active connections and starting a new one
for the css.. we would at least have a model for evaluating that with
the data.

distributions of XHR frequency, its latency as compared to handshake
rtt, and transfer size.

make a guess at the server's congestion window peaks.. try and figure
out if that is bound by bandwidth or congestion control.. bundle that
together with rtt and the data on size and we can at least figure out
what a best-case bound for certain scenarios can be.

our rwin

when we get a non-idempotent method, what else is going on temporally?
Are they so uncommon we just shouldn't mix them into the pconn pool?
What are their transfer rates - that seems like a simple question, but
due to an undersized snd buffer on <= xp FF had horrible upload rates as
recently as 2 years ago.

# parallel connections per host and # per tab..

fraction of connections and hosts that are persistent connection and/or
pipeline eligible.. and the fraction that use them

lifetime of a idle (and separately all) persistent connection(s).. along
with information on who closes them.. rate of unexpected pconn reuse
failures and subsequent reschedules.

dns lookup latencies and retry and failure rates..

dns cache hit rates (both normal and prefetched)...

peak and typical queue sizes for both http transactions and dns lookups
along with their arrival patterns

hit rates and eviction history for the disk cache..

fraction of transactions that are cancelled and the reason (dom
cancelled it, timed out, stalled, etc..)

rates of gzip encoding.. rates of ssl.. impact of either of those on
latency or transfer time as compared to other documents of similar size
and rtt..

any IPv6 activity and any successful IPv6 activity.

rate of successful revalidations.

how many redirects go to the same hostname? how many redirects go to the
same IP?

I'm sure other people have other things to add that they would like to
know about too, and not all of this has an immediate and obvious use of
course but having a full picture can be quite helpful when trying to
explore any particular theory. I've got a couple colleagues in academia
that did some really useful (but 10 years old -
http://www.amazon.com/Web-Protocols-Practice-Networking-Measurement/dp/0201710889) work on characterization and I will query them to see if there is other data they think a study like this could helpfully generate and update.

Jason Duell

Reporter

Updated

•

13 years ago

Blocks: 650129

Jason Duell

Reporter

Comment 1

•

13 years ago

Also:

- Whether using a proxy (and what type: regular, SOCKS)

- Whether using a PAC file/URL.

Bjarne (:bjarne)

Comment 2

•

13 years ago

The "rates of gzip encoding" mentioned in comment #0 is pretty relevant for bug #648429. May be rephrased as "ratio of compressed vs uncompressed files".

Brian Smith (:briansmith, :bsmith, use NEEDINFO?)

Comment 3

•

13 years ago

I started a page on the wiki for collecting and organizing these:

https://wiki.mozilla.org/Necko:Telemetry

When you sign into the wiki, you can add a watch to the page to see updates.

Jason Duell

Reporter

Comment 4

•

13 years ago

Oh, right--duh!  A wiki is much better than Bugzilla for this.  Thanks!

(dormant account)

Comment 5

•

13 years ago

bug 585196 landed. See the cycle collector probe for an example of how to add timing probes. 
Install http://people.mozilla.com/~tglek/telemetry/ping.telemetry.xpi and go to about:histograms to see your measurements. Lets get some probes landed ASAP

Honza Bambas (:mayhemer)

Comment 6

•

13 years ago

(In reply to comment #5)
> go to about:histograms to see your measurements. 

The address is about:telemetry.

Honza Bambas (:mayhemer)

Updated

•

13 years ago

Blocks: 658894

(dormant account)

Updated

•

13 years ago

Blocks: 659396

Patrick McManus [:mcmanus]

Comment 7

•

8 years ago

closing idle trackers

Status: NEW → RESOLVED

Closed: 8 years ago

Resolution: --- → INCOMPLETE

Bugzilla

Quick Search

Necko telemetry: make list of all stats we want to track

Categories

(Core :: Networking, defect)

Tracking

()

People

(Reporter: jduell.mcbugs, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Updated

Updated

Comment 7