Telemetry experiment: determine the most popular Flash video sites




3 years ago
2 years ago


(Reporter: Benjamin Smedberg, Assigned: Benjamin Smedberg)


Firefox Tracking Flags

(Not tracked)



(2 attachments)



3 years ago
We know from user feedback and market research that issues watching videos is a most common user complaint. Beyond youtube, though, we don't know what the most-used video sites are for our users. We want to measure this using telemetry with the beta audience.

Technical details:
* This will be deployed as a telemetry experiment to a sample of the beta population
* I understand that Shumway was already planning on measuring codec usage by introspecting network requests made by Flash. Assuming that code already exists, we should use the same mechanism to introspect merely the fact that a video is being loaded. till/mbx, can you provide a pointer for this?
* When a video load is detected, we will create a payload with the following information and send it to the cloud services ingestion system:
** the domain of the video content: "videoOrigin"
** the domain of the .swf: "swfOrigin"
** the domain of the page loading the plugin: "pageOrigin"
** version/channel/buildid/locale
* This data will not be collected as part of the current telemetry session ping, because we don't want to associate the data with the user's telemetry profile ID.

This is fairly urgent.
Flags: needinfo?(till)
Flags: needinfo?(mbebenita)
I haven't had the time to look into actually implementing this. I did discuss it with gfritzsche, though, so I think he knows about as much about it as I do.
Flags: needinfo?(till)

Comment 2

3 years ago
Can you tell from the first few bytes of a network request whether it's a video or not? Or if you don't know, who's the best person to ask?
Flags: needinfo?(till)
The devtools network monitor does categorization, and it detects at least mp4 and flv files as videos. I don't know who's working on that code, but they'll certainly know how to categorize the file types.
Flags: needinfo?(till)
(In reply to Till Schneidereit [:till] from comment #1)
> I haven't had the time to look into actually implementing this. I did
> discuss it with gfritzsche, though, so I think he knows about as much about
> it as I do.

I don't think we discussed this, maybe you talked to Aaron about this?
I talked to Georg about this. Our next step needs to be to figure out who knows about the content type categorization the network monitor does.

@fitzgen, who in the devtools team would know about where this categorization happens? I'm talking about how the network monitor allows one to filter on content type and, in particular, enables just seeing loaded media files. As this works for mp4 and flv files loaded by Flash instances, it seems to be exactly what we need here.
Flags: needinfo?(mbebenita) → needinfo?(nfitzgerald)
Talked to jryans about this:

The network monitor uses nsIHttpActivityObserver to observe requests[1], and has its own categorization based on mime types[2]. Note that that'll underestimate loading of video files to some extent: Flash doesn't care about mime types at all, so videos might be delivered with incorrect ones. That doesn't seem to be too much of a concern, though, as I'd expect the major sites, which we're interested in, to get this right.

Flags: needinfo?(nfitzgerald)
Thanks for checking till :)

Given that this seems relatively straight-forward and already has a devtools implementation, should we maybe just sent QA on sanity-checking this with known popular sites?
Flags: needinfo?(benjamin)

Comment 8

3 years ago
Doing the video by MIME type might be "good enough". We should definitely check by actually running the experiment against a normal browsing session. But what I don't think we've solved is associating that network load with a plugin. I emailed bz about this and his response was:

"I don't think there's a 100% reliable way.

You can sort of try to correlate the two by hooking nsIContentPolicy to watch for requests to load a TYPE_OBJECT and caching the associated element (which should be the aContext for the call) in a WeakMap or something, keyed on the URI, and then when you see the actual network load using its URI as the key to look up the element.

This would cover common cases, I'd think, though there are ways to fool it if one is trying to (e.g. loading the same URI over XHR instead or whatnot)."

So we could either add a reliable way (probably by having the nsPluginStreamListenerPeer implement a new interface to hand out a reference to its loading nsIObjectLoadingContent) or try to solve this using a content policy hackaround.
Flags: needinfo?(benjamin)
How quickly do we need this?
Adding this to the nsPluginStreamListenerPeer will still require some time to get to beta.
If we don't have that time we should look at the content policy.

Comment 10

3 years ago
I'm hoping to have useful data within 2 weeks. That should give us time to uplift a simple patch to beta, if that's the best way.
Ok, do we have anyone to take this on?
Or shall i try to check out whether the content policy seems sufficient?
This video telemetry will be useful for deciding when or whether to proceed with Shumway for Facebook's video player.
Blocks: 1110300
According to our SWF Crawler results, these are the most common used third-party video players: (Flash fallback)

We were able to detect them by looking at file paths passed as parameters to the SWFs that contain known video extensions in their file name. Websites using their own player usually pass an ID to the SWF which then generates a path and loads the video at runtime. For those we have to rely on telemetry.


3 years ago
Depends on: 1119291


3 years ago
Depends on: 1119302

Comment 14

3 years ago
Created attachment 8546096 [details]
bootstrap.js (nsIContentPolicy implementation for logging)

bz, I'm not super-familiar with nsIContentPolicy, but this appears to work in local testing. Can you check the implementation of shouldProcess to make sure that it doesn't do anything forbidden (in particular I'm asking about the sequence), and won't have unexpected performance impact?
Assignee: nobody → benjamin
Attachment #8546096 - Flags: review?(bzbarsky)
Comment on attachment 8546096 [details]
bootstrap.js (nsIContentPolicy implementation for logging)

This is short-circuiting pretty early in most cases via the type check, so the main perf impact will be the actual call into JS...  I _think_ that should be OK.  Especially if this policy is only registered when telemetry is enabled.

>    let resourceDomain = location.hostname;

That line doesn't make sense.  There is no "hostname" property on URIs.  Did you mean

>      let filePath = location.filePath.toLowerCase();

Why not:

  var extension = location.fileExtension.toLowerCase();

and then a bunch of equality compares?

>      pageDomain = context.QueryInterface(Ci.nsIDOMNode).ownerDocument.location.hostname;

You shouldn't need the QI here.

Past that, I think this should be safe, yes.

The location will be the location of the current document, which may not be the document involved in general, but for this particular case I expect them to always match up.

>      topDomain = context.QueryInterface(Ci.nsIDOMNode);

Again, no need for the QI.

Attachment #8546096 - Flags: review?(bzbarsky) → review+
Blocks: 1120590


3 years ago
Depends on: 1121686

Comment 16

3 years ago
Created attachment 8549189 [details] [diff] [review]
Attachment #8549189 - Flags: review?(felipc)
Comment on attachment 8549189 [details] [diff] [review]

Review of attachment 8549189 [details] [diff] [review]:

I'm not an expert in the factory/category manager parts, but it looks simple enough and I'm sure you know what you're doing here :)
Attachment #8549189 - Flags: review?(felipc) → review+

Comment 18

3 years ago

I intend to QA this myself, since it will be pretty simple.
Last Resolved: 3 years ago
Resolution: --- → FIXED


3 years ago
Depends on: 1122537


3 years ago
Depends on: 1123888
No longer blocks: 1110300
No longer blocks: 1120590
Blocks: 1120590
You need to log in before you can comment on or make changes to this bug.