Closed Bug 653268 Opened 13 years ago Closed 12 years ago

[Snippet Service] Track snippet load numbers

Categories

(Snippets :: Service, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 719090

People

(Reporter: lforrest, Assigned: aphadke)

References

Details

Now that snippets are internally hosted we'd like to get a better idea of how often each snippet is loaded. 

This will:
-help us calculate rough clickthrough rate of individual snippets so we can track and optimize over time
-help us understand who is viewing snippets so we can understand targeting by geo, version, etc
-let us know how often default snippet content is served (equates to service being down)

May be a couple of different ways to implement with final objective being some sort of reporting interface that can be exported/viewed on a ongoing basis.
Assignee: build → nobody
QA Contact: coop → webdev
Is there a reason we need absolute numbers here?  As long as we track how frequently snippets are shown (10%, 20%, etc.), we'll have enough information to optimize their display.
(In reply to comment #1)
> Is there a reason we need absolute numbers here?  As long as we track how
> frequently snippets are shown (10%, 20%, etc.), we'll have enough information
> to optimize their display.

Snippets rotate in and out and change often over time. We'd like to be able to compare without content having to be live side-by-side.
Assignee: nobody → malexis
This is going to be difficult (I'm guessing near impossible) because of the caching..

If you request /foo/bar and get snippet A, you are going to get that from cache until the cache expires.  Once the cache expires, the next hit you get may deliver snippet B from the backend which would then be cached.  The caching server has no idea there is a difference between the two, and the log hits are all going to be seen as /foo/bar

Maybe Les or Daniel would have an idea as to how to get around this, but the whole point of caching is to avoid the load of "individual hits" which in turn is what you are asking to quantify here (in detail)
(In reply to comment #3)

> Maybe Les or Daniel would have an idea as to how to get around this, but the
> whole point of caching is to avoid the load of "individual hits" which in turn
> is what you are asking to quantify here (in detail)

Yeah, off the top of my head, we'd fire off an HTTP request to a metrics service somewhere when a snippet is revealed, whether via image or ajax request.

I'm not totally familiar with what we use for metrics these days, so I can't speak to how that would work exactly or if it could withstand traffic from about:home

Another thing might be to track metrics on the client side, and then send off a collected report somewhere every time Fx contacts the snippet service. That would reduce the traffic, but I'm not sure if what we use for metrics would support that off-the-shelf. This starts to sound like something akin to crash reporter / socorro
(In reply to comment #4)
 
> Another thing might be to track metrics on the client side, and then send off a
> collected report somewhere every time Fx contacts the snippet service.

Oh, also: This scheme would require some changes to Fx itself, I think, if only to trigger a client-side metric report transmission on snippet fetch.
(In reply to comment #4)
> Yeah, off the top of my head, we'd fire off an HTTP request to a metrics
> service somewhere when a snippet is revealed, whether via image or ajax
> request.
> 
> I'm not totally familiar with what we use for metrics these days, so I can't
> speak to how that would work exactly or if it could withstand traffic from
> about:home
> 
> Another thing might be to track metrics on the client side, and then send off a
> collected report somewhere every time Fx contacts the snippet service. That
> would reduce the traffic, but I'm not sure if what we use for metrics would
> support that off-the-shelf. This starts to sound like something akin to crash
> reporter / socorro

But even still, the whole reason we cache is to deal with the intense load of connections from every client out there.  The request is to track every connection and these ideas still require taking a connection from every client somewhere.  In that case we may as well ditch caching and take every connection straight to the cluster  (which is not a good idea, we would be in no position to handle such load, and to do so would be a huge cost for just a metric gain)
(In reply to comment #6)

> But even still, the whole reason we cache is to deal with the intense load of
> connections from every client out there.  The request is to track every
> connection and these ideas still require taking a connection from every client
> somewhere.  In that case we may as well ditch caching and take every connection
> straight to the cluster  (which is not a good idea, we would be in no position
> to handle such load, and to do so would be a huge cost for just a metric gain)

Yup, all true. I don't want to trivialize the effort - I don't have any numbers, but it could be similar in magnitude to collecting Fx crash reports. (ie. it might not be worth it.)
If I recall correctly, when we designed about:home we purposely used a URL scheme that was equivalent to what we used for the blocklist ping so that it would be (reasonably) straightforward to roll up reporting in the same way.  Given how the system works we're not going to be able to get load rates for specific snippets, but at least we can get the volume overall.
(In reply to comment #8)
> If I recall correctly, when we designed about:home we purposely used a URL
> scheme that was equivalent to what we used for the blocklist ping so that it
> would be (reasonably) straightforward to roll up reporting in the same way. 
> Given how the system works we're not going to be able to get load rates for
> specific snippets, but at least we can get the volume overall.

This is true, too. But, I'm not sure that number will be very useful. It'll roughly be Firefox ADUs minus whomever didn't load about:home that day (eg. they never reloaded it, or they set a different home page)

But, definitely no per-snippet counts in that number.
(In reply to comment #8)
> If I recall correctly, when we designed about:home we purposely used a URL
> scheme that was equivalent to what we used for the blocklist ping so that it
> would be (reasonably) straightforward to roll up reporting in the same way. 
> Given how the system works we're not going to be able to get load rates for
> specific snippets, but at least we can get the volume overall.

We can easily track user hits, that's no problem at all..  We just can't easily tell who got what snippet.
It won't give us per-snippet counts directly, but we could infer at least a rough idea for that based upon making some assumptions. 

My thinking was that if we know that there were 1M user hits loading the snippet set and there are 4 snippets in random equally weighted rotation and we assumed that each user conservatively has at most 1 impression of the page between refresh of the snippet set -- then we can calculate that each snippet would have something in the range of 250K impressions.  

It doesn't need to be perfect. Having a base that accounts for growth and that's consistent would allow us to measure rates of change of conversion rates as we work to tune and optimize the channel.  As it stands we only see the click-through and that number will likely only grow as the user base grows, and so doesn't tell us much about it's effectiveness.
We'll be able to handle this request once we get the Metrics Data Collection Module project underway.  Basically, the project is very close to what was mentioned at the bottom of comment #4.  A system that collects metrics client-side and sends them in periodically.

Before that service is online, it is unlikely we'll be able to do anything with the actual per-snippit counts.

We could look at implementing some metrics that Chris outlined in comment #11, but we need a lot more details such as examples of what the data looks like.  We also have to prioritize that work against several other projects we have in the queue for this quarter, including developing the DCM project I mentioned above.
Couldn't we just use Webtrends JavaScript to track click-through rates on the snippets if you consider each snippet an ad?

For example:

WT.ad=HTML5-demo-3-snippet -- for impression
WT.ac=HTML5-demo-3-snippet -- for click

WT.ad / WT.ac = click-through rate.

Regardless if the snippet is cached or not, a Webtrends call will be made and thus impressions, click-through rates, and other valuable geo-location metrics will be stored for analysis.
Just found out the monthly pageviews for FF home (3.6) and that would overwhelm Webtrends for sure unless it is sampled down or we come up with another data collection method. Ignore my previous comment....
(In reply to comment #2)
> (In reply to comment #1)
> > Is there a reason we need absolute numbers here?  As long as we track how
> > frequently snippets are shown (10%, 20%, etc.), we'll have enough information
> > to optimize their display.
> 
> Snippets rotate in and out and change often over time. We'd like to be able
> to compare without content having to be live side-by-side.

As long as we track the start and end date of each snippet, it will be relatively simple to build a very strong proxy for each snippet's click though rate.  Unless I'm missing something, doing so should capture most of the benefit at a fraction of the cost.
(In reply to comment #15)

> As long as we track the start and end date of each snippet, it will be
> relatively simple to build a very strong proxy for each snippet's click
> though rate.  Unless I'm missing something, doing so should capture most of
> the benefit at a fraction of the cost.

I think the issue is that this bug is about snippet *impressions*, not click-throughs. We already measure click-throughs on snippet links, for the most part.
Assignee: malexis → aphadke
Depends on: 690881
Component: Webdev → Service
Product: mozilla.org → Snippets
Version: other → unspecified
The snippet tracking efforts from almost a year ago have since given us many of the metrics requested here. Marking as a duplicate of that project, if we need more stats we'll make a new bug.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.