Closed Bug 1685399 Opened 5 years ago Closed 5 years ago

Add telemetry helper class for media element and refactor play time related probes

Categories

(Core :: Audio/Video: Playback, task, P1)

task

Tracking

()

RESOLVED FIXED
86 Branch
Tracking Status
firefox86 --- fixed

People

(Reporter: alwu, Assigned: alwu)

References

(Blocks 1 open bug, Regressed 1 open bug)

Details

Attachments

(15 files, 2 obsolete files)

48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
3.19 KB, text/plain
chutten
: data-review+
Details

Some probes related with shutdown video decoder are not measured correctly, which cause inaccurate result. I'm going to improve them. See comment13 for more details.

quick update : I've finished the implementation, now I'm going to write tests for those permanent probes.

mCurrentLoadPlayTime was added in [1], which is no longer in use, so we should remove it.

[1] https://phabricator.services.mozilla.com/D18628

There are several functions related with an element's visibie state, which are confusing. So this patch is going to make them clearer and remove unnecessary function.

  • IsVisible() : add decription to mention that visibility state is only for layout level, which doesn't represent actual visible state.
  • rename IsHidden() to IsActualInvisible() : make it represent the actual visible state of an element.
  • remove IsActive() : current two callee of IsActive() only care about if the page is inactive or not, it doesn't care about if page hidden or not. So we can call owner doc's method directly.

Depends on D101106

There is no need for decoder to use both "document visibility" and "element's layout visibility" to determine if we should suspend decoding.

That can simply be done by checking HTMLMediaElement::IsActualInvisible().

Depends on D101107

Depends on D101109

PlayTimeAccumulator would be responsible to accumulate and report the play time related probes.

Depends on D101111

The cases of accumulated invisible play time we want to collect are that video keeps playing invisibly and still being alive in near future.

However, from the collected result [1], there are huge amount of data seems not the cases that we want to observe, which would probably be related with page unloading and unbinding element from DOM tree.

Therefore, we want to filter out those cases where video plays invisibly but would soon be gone or paused. Those cases would only generate really small amount of invisible time, which should be ignored and set it to zero.

[1] https://mzl.la/3q2XbhY

Depends on D101112

Depends on D101265

Depends on: 1685503

As current way we use lead to some incorrect measurements, here I will describe why I need to change them.

(1) Incorrect measurement of starting play time
We start accumulating time on play, which indicates the change of mPaused and doesn't mean media has actually started playing. The timer should be started on playing only.

(2) Does not stop accumulating time when encountering error, or aborting current load
We pause the timer only when encoutering wait or pause. Therefore, when error occurs or media changes to new resouces, the timer would keep accumulating time, which is wrong.

(3) Accumulating time while the page is in the bf-cache
We start accumulating invisible time in HTMLMediaElement::NotifyOwnerDocumentActivityChanged(), which seems a place to notify about its document activity change when page shows or hides. However, that function would also be called when a page moves in or out the bf-cache, so that means the timer would keep running when the page is in the bf-cache.

(4) Report overlapping accumulated result
We report the telemetry result when we suspend the page. The problem is similar with (3), that function is not only being called when the page gets discarded. That would also be called when the page goes to the bf-cache.

Here is an example,

  1. before enter bf-cache : playing 10s for visible time
  2. enter the bf-cache : REPORT 10s visible time and start accumulating invisible time (wrong!)
  3. being in the bf-cache in 10s : so far, we have 10s visible time and 10s in invisible time
  4. leave the bf-cache and stay in foreground 10s : so far, we have 20s visible time and 10s in invisible time
  5. enter the bf-cache again : REPORT 20s visible time and 10s invisible time

Let's ignore the invisible time that shouldn't be reported, we only focus on the visible time. In the first report, we've already reported 10s, but in the second report, the 20 seconds we reported is actullay including the first reported time, so the more times we report the probes, the more accumulated visible time would be, which is apparently wrong.

(5) inconsistent and inaccurate check for invisible media
We use IsHidden() to start the invisible timer here, but in another place we use the element's visiblity to start timer, which is inconsistent.

In addition, both IsHidden() and element's visibility state are not able to accurately determine the actual visible state for media element.

Summary: Refine the probes related with shutdown video decoder → Refine probes related play time accumulation
No longer depends on: 1685503
See Also: → 1685503

The cases of accumulated invisible play time we want to collect are that video keeps playing invisibly and still being alive in near future.

However, from the collected result [1], there are huge amount of data seems not the cases that we want to observe, which would probably be related with page unloading and unbinding element from DOM tree.

Therefore, we want to filter out those cases where video plays invisibly but would soon be gone or paused. Those cases would only generate really small amount of invisible time, which should be ignored and set it to zero.

[1] https://mzl.la/3q2XbhY

Depends on D101112

Attachment #9196233 - Attachment is obsolete: true

(In reply to Alastor Wu [:alwu] from comment #13)

Let's ignore the invisible time that shouldn't be reported, we only focus on the visible time. In the first report, we've already reported 10s, but in the second report, the 20 seconds we reported is actullay including the first reported time, so the more times we report the probes, the more accumulated visible time would be, which is apparently wrong.

What scenario have you tried? I tried a bunch of scenarios on youtube a few weeks ago, and the time were accurate, because youtube is a single page app, and doesn't really use the bf-cache like that. It's far from being the only very-high profile website that is like that.

Good find in any case. I'm just wondering about the real-world impact of this on the numbers.

Attachment #9196544 - Attachment is obsolete: true

(In reply to Paul Adenot (:padenot) from comment #15)

What scenario have you tried? I tried a bunch of scenarios on youtube a few weeks ago, and the time were accurate, because youtube is a single page app, and doesn't really use the bf-cache like that. It's far from being the only very-high profile website that is like that.

Good find in any case. I'm just wondering about the real-world impact of this on the numbers.

  1. go to https://alastor0325.github.io/htmltests/autoplay_tests/index.html
  2. click any link (eg. test01)
  3. start video
  4. go back to the previous page by clicking previous page button (Then the page would be in the bf-cache)
  5. click next page button (Then the page would leave the bf-cache)

In addtion, I'm still modifying my patches, I encountered some weird issues during testing, will continue to work on them tomorrow.

We don't want to report the accumulated time that we've reported before, see '(4)' in [1] for more details.

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1685399#c13

Depends on D101559

Attachment #9196234 - Attachment description: Bug 1685399 - part9 : add chrome-only attributes to allow us check accumulated time during testing. → Bug 1685399 - part10 : add chrome-only attributes to allow us check accumulated time during testing.
Attachment #9196235 - Attachment description: Bug 1685399 - part10 : record probes on release channel as well. → Bug 1685399 - part11 : record probes on release channel as well.
Attachment #9196236 - Attachment description: Bug 1685399 - part11 : add test. → Bug 1685399 - part13 : add test.
Attachment #9196236 - Attachment description: Bug 1685399 - part13 : add test. → Bug 1685399 - part12 : use image to validate video data.

display represents the actual rendering result, which means that would be affected by CSS rules, but here we want to validate the real data.

Eg. if video has an valid image (640*360) but also has a CSS rule (display:none), it would not display on the page, but it do have video data.

Depends on D101265

Attachment #9196236 - Attachment description: Bug 1685399 - part12 : use image to validate video data. → Bug 1685399 - part13 : add test.
Attachment #9196757 - Attachment description: Bug 1685399 - part12 : use image to validate video data. → Bug 1685399 - part12 : only consider a video info as invalid when both image and display dimensions are null.
Attachment #9196756 - Attachment description: Bug 1685399 - part9 : clean timer after report the reuslt to prevent reporting same data again. → Bug 1685399 - part9 : clean timer after report the result to prevent reporting same data again.
Summary: Refine probes related play time accumulation → Add telemetry helper class for media element and refactor play time related probes

I plan to land these patches on this release, so change the priority to P1.

Priority: -- → P1

As we just modify our implementation for measuring probes, we should also extend them in order to collect the new result.

Attachment #9197172 - Attachment description: Bug 1685399 - part13 : extend expired verison for some probes. → Bug 1685399 - part14 : extend expired verison for some probes.
Attached file data-review-request
Attachment #9197173 - Flags: data-review?(chutten)
Attachment #9196757 - Attachment description: Bug 1685399 - part12 : only consider a video info as invalid when both image and display dimensions are null. → Bug 1685399 - part12 : check video info's display and image size to verify if video is valid.

Comment on attachment 9197173 [details]
data-review-request

DATA COLLECTION REVIEW RESPONSE:

Is there or will there be documentation that describes the schema for the ultimate data set available publicly, complete and accurate?

Yes.

Is there a control mechanism that allows the user to turn the data collection on and off?

Yes. This collection is Telemetry so can be controlled through Firefox's Preferences.

If the request is for permanent data collection, is there someone who will monitor the data over time?

Alastor Wu is responsible for the permanent collections. The non-permanent ones will expire in Firefox 92.

Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under?

Category 2, Interaction.

Is the data collection request for default-on or default-off?

Default on for all channels. (Except for a few pre-release-only ones)

Does the instrumentation include the addition of any new identifiers?

No.

Is the data collection covered by the existing Firefox privacy notice?

Yes.

Does there need to be a check-in in the future to determine whether to renew the data?

Yes. Alastor Wu is responsible for renewing or removing expiring collection before they expire in Firefox 92.


Result: datareview+

Attachment #9197173 - Flags: data-review?(chutten) → data-review+

I've tested following behaviors on some popular media websites, such as Youtube, Vimeo, and reported telemetry all work as expectation.

(1) switch page between background and foreground
(2) use same element but switch to different source to play
(3) skip ad
(4) navigation between different videos
(5) close page when video is still playing or already being paused
(6) swtich different resolution for video

Pushed by alwu@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/80f8af60004d part1 : remove `mCurrentLoadPlayTime`. r=padenot https://hg.mozilla.org/integration/autoland/rev/9fc499a76660 part2 : reorgnize functions being used for determining if element is visible. r=padenot,webidl,mccr8 https://hg.mozilla.org/integration/autoland/rev/7d8b2275be91 part3 : use actual invisible state to determine if we should suspend decoding. r=padenot https://hg.mozilla.org/integration/autoland/rev/33df9e2b87a1 part4 : check media info directly to know if video is valid. r=padenot https://hg.mozilla.org/integration/autoland/rev/c4bb5ad5c8c9 part5 : remove unused state 'PLAY_STATE_START'. r=padenot https://hg.mozilla.org/integration/autoland/rev/68d935d0850a part6 : remove unused class 'MockMediaDecoderOwner'. r=padenot https://hg.mozilla.org/integration/autoland/rev/a434e3cd5efc part7 : implement a helper class to accumulate and report the telemetry probe. r=padenot,bryce https://hg.mozilla.org/integration/autoland/rev/75afd5fca74c part8 : remove unused methods in 'TimeDurationAccumulator'. r=padenot https://hg.mozilla.org/integration/autoland/rev/676740b06571 part9 : clean timer after report the result to prevent reporting same data again. r=padenot https://hg.mozilla.org/integration/autoland/rev/0527bd0ca81f part10 : add chrome-only attributes to allow us check accumulated time during testing. r=padenot,emilio https://hg.mozilla.org/integration/autoland/rev/22c0d4d6f2e1 part11 : record probes on release channel as well. r=bryce,padenot https://hg.mozilla.org/integration/autoland/rev/9837de62d813 part12 : check video info's display and image size to verify if video is valid. r=bryce https://hg.mozilla.org/integration/autoland/rev/8379725ca873 part13 : add test. r=padenot https://hg.mozilla.org/integration/autoland/rev/0d5d7994edd4 part14 : extend expired verison for some probes. r=bryce
Regressions: 1625099
No longer regressions: 1625099
See Also: → 1714264

For the result of VIDEO_PLAY_TIME_MS, it seems reasonable to me. The previous collective way would sometime result in reporting same accumulated result multiple times. Eg. a video has already been played 10s and it triggers the point of reportting telemetry (report 10s). Then it keeps playing to 20s and trigger another point (report 20s). So we actually report 30s for the media that only plays for 20s.

The new triggering point of reporting telemetry will happen whenever the media decoder stops decoding. Eg. pause, seek, resolution change, or media gets destroyed and it won't report the same value again. Eg. play a video that is 30 minutes long, but pause on the place at 5 minutes for a break (report 5 minutes), and then go back to watch the rest of 25 minutes (report 25 minutes)

So I expect to see (1) more reported sample count and (2) each sample count has less value because we won't accumulate data. You can see both situations happened on the telemetry report, so I think it's no problem.


For the result of VIDEO_WIDEVINE_PLAY_TIME_MS, the result looks a little bit suspicous. I tested on this video and the result looks the same as the real time that I spent on watching the video. Because the result looks too low, I wonder if we [can't get tFor the result of VIDEO_WIDEVINE_PLAY_TIME_MS, the result looks a little bit suspicous. I tested on this video and the result looks the same as the real time that I spent on watching the video. Because the result looks too low, I wonder if we can't get the key system at the time we report the result to the Telemetry, which makes us not be able to report the CDM play time. Bryce, any thought?

Flags: needinfo?(alwu) → needinfo?(bvandyk)

It's possible we're not getting the keysystem. This will also happen if the MediaKeys are detached from the media element by the time we record the data (and may be the result of some error I haven't predicted). If keys are not being detected, that could also affect the old telemetry, as the same mechanism was used. It's possible the issue is exacerbated by the changes though.

Note that the encrypted play time probe shows a similar trend. This probe relies on us having parsed some encrypted data, and shouldn't be impacted by the presence of MediaKeys (though we'd expect a strong correlation in the cases). There's a discontinuity in that graph, but ignoring that, we see the same drop off in median times, but a much larger number of samples collected.

The Widevine/Encrypted probes previously had some very large values at the top of their range that don't exist follow the changes. Maybe those were incorrect super counts that came from binge watching (if we had long watch times that were reported multiple times). It's possible that we now have those rolled into more reasonable counts that we report each time we change episode or similar.

Flags: needinfo?(bvandyk)
See Also: → 1754648

(In reply to Alastor Wu [:alwu] from comment #7)

Created attachment 9196002 [details]
Bug 1685399 - part6 : remove unused class 'MockMediaDecoderOwner'.

This was unused when introduced in https://hg.mozilla.org/mozilla-central/rev/e4679c01892b

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: