Closed Bug 1618529 Opened 5 years ago Closed 5 years ago

Encrypted subsamples with clear ranges near UINT16_MAX can overflow clear ranges during AnnexB conversion

Categories

(Core :: Audio/Video: Playback, defect, P2)

73 Branch
defect

Tracking

()

RESOLVED FIXED
83 Branch
Tracking Status
firefox-esr78 --- fixed
firefox81 --- wontfix
firefox82 --- verified
firefox83 --- verified

People

(Reporter: janliska, Assigned: bryce)

Details

Attachments

(2 files)

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36

Steps to reproduce:

Actual results:

Most of the times video stops playing

Expected results:

Video should start playing at given point.

Bugbug thinks this bug should belong to this component, but please revert this change in case of error.

Component: Untriaged → Audio/Video: Playback
Product: Firefox → Core
Assignee: nobody → bvandyk
Priority: -- → P3

We've updated handling around certain streams and it should improve compat with protected streams. I've retested the link in the first comment and believe it's now working. Could you retest (with Firefox nightly) and let me know if you see the same?

Flags: needinfo?(janliska)

Thanks for update. I've tested nightly build with given stream and I am sorry to say that but the issue persists. Sometimes it looks good for a long time, but if you seek frequently eventually it stops. And sometimes it breaks right at the beginning.
Can I help you somehow with debugging? I can send you screen capture with issue, would it be helpful to you?

Flags: needinfo?(janliska)

(In reply to janliska from comment #3)

Thanks for update. I've tested nightly build with given stream and I am sorry to say that but the issue persists. Sometimes it looks good for a long time, but if you seek frequently eventually it stops. And sometimes it breaks right at the beginning.
Can I help you somehow with debugging? I can send you screen capture with issue, would it be helpful to you?

If you have any logging that we could use to trace the specific series of actions that leads to the stall, that would be helpful. E.g. if there a specific point that fails when seeking because Fx fails to parse the data at that point, it would be useful to know about that and it could help us produce a more minimal and deterministic test case.

First of all, I would like to apologize for the delay in communication, nevertheless finding the exact error trigger took us longer than we expected.

We were able to find out that error happens in the moment of change of m4v init file. This occurs when changing ABR profile or during seeking over DASH <Period>.
For testing you can use the previously sent https://drm-test.livesporttv.cloud/issue/index.php (login and password was sent by 'janliska' 28 February 2020, in case of need I can send again).

Latest tests on 77.0.1 (64-bit) and Nightly 79.0a1 (2020-06-08) (64-bit) both for Windows (tested also on earlier versions since inception of this issue).

Would like to receive clarification of anything else? Please let me know.

Hello Bryce, is there any news on your side regarding this? As M pointed out some time ago, we trace down the issue to the change of m4v init files. Do you need any additional information from us to investigate?

If you're able to identify a the files that can be appended to trigger the issue it would be useful if you could either send them through so we can reproduce the issue or to set up a site that just serves those files to repro the error. In either case this would allow us to better isolate the specific issue.

(In reply to Bryce Seager van Dyk (:bryce) from comment #7)

If you're able to identify a the files that can be appended to trigger the issue it would be useful if you could either send them through so we can reproduce the issue or to set up a site that just serves those files to repro the error. In either case this would allow us to better isolate the specific issue.

Can I send you a demo via email? I cannot post this publicly. Thanks

(In reply to Bryce Seager van Dyk (:bryce) from comment #7)

If you're able to identify a the files that can be appended to trigger the issue it would be useful if you could either send them through so we can reproduce the issue or to set up a site that just serves those files to repro the error. In either case this would allow us to better isolate the specific issue.

I've just sent you details to bvandyk@mozilla.com

Thanks, confirming I received the details and will investigate.

I'm reproing by switching resolution of the streams. It would be useful to have a deterministic testcase if possible. E.g. one that served an init segment for a certain resolution, gave some media at that resolution, then switched and reliably fails.

In the mean time, the issue is that the Widevine CDM is failing to decrypt and decode the data provided after a resolution switch. The CDM itself is largely a black box, but it's giving the error code indicating a general purpose decryption failure. This suggests the CDM thinks it has the key to decrypt the data it receives, but is failing to decrypt when it actually tries.

(In reply to Bryce Seager van Dyk (:bryce) from comment #11)

I'm reproing by switching resolution of the streams. It would be useful to have a deterministic testcase if possible. E.g. one that served an init segment for a certain resolution, gave some media at that resolution, then switched and reliably fails.

In the mean time, the issue is that the Widevine CDM is failing to decrypt and decode the data provided after a resolution switch. The CDM itself is largely a black box, but it's giving the error code indicating a general purpose decryption failure. This suggests the CDM thinks it has the key to decrypt the data it receives, but is failing to decrypt when it actually tries.

I've just sent you an updated link via email.

(In reply to Bryce Seager van Dyk (:bryce) from comment #11)

I'm reproing by switching resolution of the streams. It would be useful to have a deterministic testcase if possible. E.g. one that served an init segment for a certain resolution, gave some media at that resolution, then switched and reliably fails.

In the mean time, the issue is that the Widevine CDM is failing to decrypt and decode the data provided after a resolution switch. The CDM itself is largely a black box, but it's giving the error code indicating a general purpose decryption failure. This suggests the CDM thinks it has the key to decrypt the data it receives, but is failing to decrypt when it actually tries.

Hello Bryce, was the last video nay helpful? It keeps failing quite reliably. Let me know if I can provide any more information.

Think I've located the source of the issue.

ISO/IEC 23001-7 specifies that the # clear bytes for subsamples are stored in a uint16 and we read that into a uint16. However, we end up expanding that range while performing AnnexB conversion -- which typically takes place on the first sample after we (re) init. If the number of clear bytes is near UINT16_MAX then this expansion can lead to an overflow, which is what is happening here.

The Widevine interface accepts a uint32_t for the # clear bytes values, possible to mitigate this issue. We should be able to mitigate this by storing the value as a uint32_t internally.

Severity: normal → S3
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Priority: P3 → P2
Summary: Widevine integration → Encrypted subsamples with clear ranges near UINT16_MAX can overflow clear ranges during AnnexB conversion

Related issue on Shaka Player's issue tracker: https://github.com/google/shaka-player/issues/2427

Thanks, Mozilla, for your work on this!

Add test code to ensure AnnexB conversions behave as we expect. This adds some
coverage for non-encrypted conversions that we only tested with broader tests
until now. It also adds a test to ensure we don't overflow our subsample sizes
when dealing with encrypted media with very large subsamples. This latter test
covers the issue seen in bug 1618529.

This avoids us risking an overflow when we convert encrypted media with
subsamples to AnnexB (since that conversion can grow the clear sizes of the
sample). See the test in the preceding patch for an example of how and why this
happens.

Depends on D92299

Pushed by bvandyk@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/a9ede779b827 Add gtest coverage for AnnexB conversions. r=jolin https://hg.mozilla.org/integration/autoland/rev/c1a849c9d7bf Store clear crypto subsample info in uint32 rather than uint16. r=jolin
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → 83 Branch

What's the impact of this bug? Do we need to uplift to branches? (I'm going to guess no at least for 82 given the S3 severity and where we are in the cycle.)

Flags: needinfo?(bvandyk)

(In reply to Julien Cristau [:jcristau] from comment #20)

What's the impact of this bug? Do we need to uplift to branches? (I'm going to guess no at least for 82 given the S3 severity and where we are in the cycle.)

We'll fail to play certain encrypted media streams. I'm not sure about the prevalence of these streams. The packaging/encrypting software needs to behave in a certain way (using very large clear ranges on samples). I hadn't seen this before, and the packagers I'm aware of do the opposite and keep the clear ranges small[0]. However, there's clearly some packagers out there doing this.

We should uplift to ESR. I'd been thinking about requesting uplift for beta. The bug and fix itself are well understood and the fix should be a low regression risk (we widen some data from 16 bits -> 32 bits that would be widened later in our media pipeline anyway). I'll request uplift.

[0] Which makes creating test media for end to end tests of this tough, I'd need to do a custom build of shaka packager or bento4's encryptor.

Flags: needinfo?(bvandyk)

Comment on attachment 9179374 [details]
Bug 1618529 - Store clear crypto subsample info in uint32 rather than uint16. r?jolin

Beta/Release Uplift Approval Request

  • User impact if declined: Certain encrypted media streams will fail to playback. Specifically encrypted media that uses large clear (unencrypted) ranges. It's difficult to know the prevalence of such media, but given this bug report as well as reports on Google's shaka packager github (one is linked in the bug), people are encountering this issue.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: 1668099 - I think this will be needed to get the tests in this chain working.
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): The changes made are fairly small in scope and well understood -- we widen and integer (16 -> 32 bits) that would be widened later anyway. We do it earlier to avoid an overflow. The changes have been verified both manually, and by adding a test which which allowed for a red green refactor.
  • String changes made/needed: None.

ESR Uplift Approval Request

  • If this is not a sec:{high,crit} bug, please state case for ESR consideration: Fixes us failing to play back encrypted media.
  • User impact if declined: Certain encrypted media streams will fail to playback. Specifically encrypted media that uses large clear (unencrypted) ranges. It's difficult to know the prevalence of such media, but given this bug report as well as reports on Google's shaka packager github (one is linked in the bug), people are encountering this issue.
  • Fix Landed on Version: 83
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): The changes made are fairly small in scope and well understood -- we widen and integer (16 -> 32 bits) that would be widened later anyway. We do it earlier to avoid an overflow. The changes have been verified both manually, and by adding a test which which allowed for a red green refactor.

Note: I'm fairly sure bug 1668099 will be needed to uplift the gtest on this bug.

  • String or UUID changes made by this patch: None
Attachment #9179374 - Flags: approval-mozilla-esr78?
Attachment #9179374 - Flags: approval-mozilla-beta?
Attachment #9179373 - Flags: approval-mozilla-beta?
Attachment #9179373 - Flags: approval-mozilla-esr78?

Comment on attachment 9179374 [details]
Bug 1618529 - Store clear crypto subsample info in uint32 rather than uint16. r?jolin

approved for 82.0b9 and 78.4.0esr

Attachment #9179374 - Flags: approval-mozilla-esr78?
Attachment #9179374 - Flags: approval-mozilla-esr78+
Attachment #9179374 - Flags: approval-mozilla-beta?
Attachment #9179374 - Flags: approval-mozilla-beta+
Attachment #9179373 - Flags: approval-mozilla-esr78?
Attachment #9179373 - Flags: approval-mozilla-esr78+
Attachment #9179373 - Flags: approval-mozilla-beta?
Attachment #9179373 - Flags: approval-mozilla-beta+

@Janliska Could you please confirm the fix on beta and esr? We don't have any credentials for the provided website, so if you could help us it would be greatly appreciated.
You can get the builds here:
Beta - https://archive.mozilla.org/pub/firefox/candidates/82.0b9-candidates/build1/
ESR - https://treeherder.mozilla.org/#/jobs?repo=mozilla-esr78&revision=e2fade7a3527980ce662d28adb015c943017284d&selectedTaskRun=eVLWRSGfR3i3gEfFxNfQ8g.0

This https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/eVLWRSGfR3i3gEfFxNfQ8g/runs/0/artifacts/public/build/target.dmg would be for macOS specifically.

Flags: needinfo?(janliska)

I've reached out via email to confirm the fix with the organization that reported the issue. They've confirmed that this is fixed in nightly + beta.

Flags: needinfo?(janliska)

I confirm here as well. We've tested intensively on both nightly and beta. It's running perfectly fine. The issue has been solved.

Would you be so kind to confirm that the bug is also fixed on 78.4.0ESR (https://archive.mozilla.org/pub/firefox/candidates/78.4.0esr-candidates/build2/)?
Thanks!

Flags: needinfo?(tomas.bacik)
Flags: needinfo?(bvandyk)

(In reply to Camelia Badau [:cbadau], Release Desktop QA from comment #29)

Would you be so kind to confirm that the bug is also fixed on 78.4.0ESR (https://archive.mozilla.org/pub/firefox/candidates/78.4.0esr-candidates/build2/)?
Thanks!

Yes. I confirm the bug is also fixed here.

Flags: needinfo?(tomas.bacik)
Flags: needinfo?(bvandyk)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: