Last Comment Bug 1108668 - Telemetry experiment: determine the most popular Flash video sites
: Telemetry experiment: determine the most popular Flash video sites
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: Plug-ins (show other bugs)
: unspecified
: x86_64 Linux
-- normal (vote)
: ---
Assigned To: Benjamin Smedberg [:bsmedberg]
:
: Benjamin Smedberg [:bsmedberg]
Mentors:
Depends on: 1119291 1119302 1121686 1122537 1123888
Blocks: shumway-jw2
  Show dependency treegraph
 
Reported: 2014-12-08 10:22 PST by Benjamin Smedberg [:bsmedberg]
Modified: 2015-05-26 11:48 PDT (History)
15 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
bootstrap.js (nsIContentPolicy implementation for logging) (3.99 KB, text/plain)
2015-01-08 11:39 PST, Benjamin Smedberg [:bsmedberg]
bzbarsky: review+
Details
flash-videoloads (14.02 KB, patch)
2015-01-14 14:28 PST, Benjamin Smedberg [:bsmedberg]
felipc: review+
Details | Diff | Splinter Review

Description User image Benjamin Smedberg [:bsmedberg] 2014-12-08 10:22:05 PST
We know from user feedback and market research that issues watching videos is a most common user complaint. Beyond youtube, though, we don't know what the most-used video sites are for our users. We want to measure this using telemetry with the beta audience.

Technical details:
* This will be deployed as a telemetry experiment to a sample of the beta population
* I understand that Shumway was already planning on measuring codec usage by introspecting network requests made by Flash. Assuming that code already exists, we should use the same mechanism to introspect merely the fact that a video is being loaded. till/mbx, can you provide a pointer for this?
* When a video load is detected, we will create a payload with the following information and send it to the cloud services ingestion system:
** the domain of the video content: "videoOrigin"
** the domain of the .swf: "swfOrigin"
** the domain of the page loading the plugin: "pageOrigin"
** version/channel/buildid/locale
* This data will not be collected as part of the current telemetry session ping, because we don't want to associate the data with the user's telemetry profile ID.

This is fairly urgent.
Comment 1 User image Till Schneidereit [:till] 2014-12-08 11:45:28 PST
I haven't had the time to look into actually implementing this. I did discuss it with gfritzsche, though, so I think he knows about as much about it as I do.
Comment 2 User image Benjamin Smedberg [:bsmedberg] 2014-12-08 12:05:16 PST
Can you tell from the first few bytes of a network request whether it's a video or not? Or if you don't know, who's the best person to ask?
Comment 3 User image Till Schneidereit [:till] 2014-12-08 13:21:56 PST
The devtools network monitor does categorization, and it detects at least mp4 and flv files as videos. I don't know who's working on that code, but they'll certainly know how to categorize the file types.
Comment 4 User image Georg Fritzsche [:gfritzsche] 2014-12-09 02:31:20 PST
(In reply to Till Schneidereit [:till] from comment #1)
> I haven't had the time to look into actually implementing this. I did
> discuss it with gfritzsche, though, so I think he knows about as much about
> it as I do.

I don't think we discussed this, maybe you talked to Aaron about this?
Comment 5 User image Till Schneidereit [:till] 2014-12-09 07:37:45 PST
I talked to Georg about this. Our next step needs to be to figure out who knows about the content type categorization the network monitor does.

@fitzgen, who in the devtools team would know about where this categorization happens? I'm talking about how the network monitor allows one to filter on content type and, in particular, enables just seeing loaded media files. As this works for mp4 and flv files loaded by Flash instances, it seems to be exactly what we need here.
Comment 6 User image Till Schneidereit [:till] 2014-12-09 08:46:02 PST
Talked to jryans about this:

The network monitor uses nsIHttpActivityObserver to observe requests[1], and has its own categorization based on mime types[2]. Note that that'll underestimate loading of video files to some extent: Flash doesn't care about mime types at all, so videos might be delivered with incorrect ones. That doesn't seem to be too much of a concern, though, as I'd expect the major sites, which we're interested in, to get this right.

[1] http://dxr.mozilla.org/mozilla-central/source/toolkit/devtools/webconsole/network-monitor.js#504
[2] http://mxr.mozilla.org/mozilla-central/source/browser/devtools/netmonitor/netmonitor-view.js#930
Comment 7 User image Georg Fritzsche [:gfritzsche] 2014-12-10 03:17:02 PST
Thanks for checking till :)

Given that this seems relatively straight-forward and already has a devtools implementation, should we maybe just sent QA on sanity-checking this with known popular sites?
Comment 8 User image Benjamin Smedberg [:bsmedberg] 2014-12-10 07:59:10 PST
Doing the video by MIME type might be "good enough". We should definitely check by actually running the experiment against a normal browsing session. But what I don't think we've solved is associating that network load with a plugin. I emailed bz about this and his response was:

"I don't think there's a 100% reliable way.

You can sort of try to correlate the two by hooking nsIContentPolicy to watch for requests to load a TYPE_OBJECT and caching the associated element (which should be the aContext for the call) in a WeakMap or something, keyed on the URI, and then when you see the actual network load using its URI as the key to look up the element.

This would cover common cases, I'd think, though there are ways to fool it if one is trying to (e.g. loading the same URI over XHR instead or whatnot)."

So we could either add a reliable way (probably by having the nsPluginStreamListenerPeer implement a new interface to hand out a reference to its loading nsIObjectLoadingContent) or try to solve this using a content policy hackaround.
Comment 9 User image Georg Fritzsche [:gfritzsche] 2014-12-10 08:09:59 PST
How quickly do we need this?
Adding this to the nsPluginStreamListenerPeer will still require some time to get to beta.
If we don't have that time we should look at the content policy.
Comment 10 User image Benjamin Smedberg [:bsmedberg] 2014-12-10 08:58:14 PST
I'm hoping to have useful data within 2 weeks. That should give us time to uplift a simple patch to beta, if that's the best way.
Comment 11 User image Georg Fritzsche [:gfritzsche] 2014-12-10 09:44:09 PST
Ok, do we have anyone to take this on?
Or shall i try to check out whether the content policy seems sufficient?
Comment 12 User image Chris Peterson [:cpeterson] 2014-12-11 12:23:14 PST
This video telemetry will be useful for deciding when or whether to proceed with Shumway for Facebook's video player.
Comment 13 User image Tobias Schneider [:tobytailor] 2014-12-17 12:21:33 PST
According to our SWF Crawler results, these are the most common used third-party video players:

http://www.jwplayer.com/
http://flowplayer.org/
http://flv-player.net/players/maxi/
http://mediaelementjs.com/ (Flash fallback)

We were able to detect them by looking at file paths passed as parameters to the SWFs that contain known video extensions in their file name. Websites using their own player usually pass an ID to the SWF which then generates a path and loads the video at runtime. For those we have to rely on telemetry.
Comment 14 User image Benjamin Smedberg [:bsmedberg] 2015-01-08 11:39:44 PST
Created attachment 8546096 [details]
bootstrap.js (nsIContentPolicy implementation for logging)

bz, I'm not super-familiar with nsIContentPolicy, but this appears to work in local testing. Can you check the implementation of shouldProcess to make sure that it doesn't do anything forbidden (in particular I'm asking about the node.ownerDocument.defaultView.top.location sequence), and won't have unexpected performance impact?
Comment 15 User image Boris Zbarsky [:bz] 2015-01-08 12:15:02 PST
Comment on attachment 8546096 [details]
bootstrap.js (nsIContentPolicy implementation for logging)

This is short-circuiting pretty early in most cases via the type check, so the main perf impact will be the actual call into JS...  I _think_ that should be OK.  Especially if this policy is only registered when telemetry is enabled.

>    let resourceDomain = location.hostname;

That line doesn't make sense.  There is no "hostname" property on URIs.  Did you mean location.host?

>      let filePath = location.filePath.toLowerCase();

Why not:

  var extension = location.fileExtension.toLowerCase();

and then a bunch of equality compares?

>      pageDomain = context.QueryInterface(Ci.nsIDOMNode).ownerDocument.location.hostname;

You shouldn't need the QI here.

Past that, I think this should be safe, yes.

The location will be the location of the current document, which may not be the document involved in general, but for this particular case I expect them to always match up.

>      topDomain = context.QueryInterface(Ci.nsIDOMNode).ownerDocument.defaultView.top.location.hostname;

Again, no need for the QI.

r=me
Comment 16 User image Benjamin Smedberg [:bsmedberg] 2015-01-14 14:28:08 PST
Created attachment 8549189 [details] [diff] [review]
flash-videoloads
Comment 17 User image :Felipe Gomes (needinfo me!) 2015-01-15 08:00:25 PST
Comment on attachment 8549189 [details] [diff] [review]
flash-videoloads

Review of attachment 8549189 [details] [diff] [review]:
-----------------------------------------------------------------

I'm not an expert in the factory/category manager parts, but it looks simple enough and I'm sure you know what you're doing here :)
Comment 18 User image Benjamin Smedberg [:bsmedberg] 2015-01-15 08:35:58 PST
http://hg.mozilla.org/webtools/telemetry-experiment-server/rev/d2683a911749

I intend to QA this myself, since it will be pretty simple.

Note You need to log in before you can comment on or make changes to this bug.