Closed Bug 1533777 Opened 6 years ago Closed 6 years ago

Change assert in CheckedInt::value to a MOZ_RELEASE_ASSERT

Categories

(Core :: MFBT, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla68
Tracking Status
firefox-esr60 --- wontfix
firefox65 --- wontfix
firefox66 --- wontfix
firefox67 --- wontfix
firefox68 --- fixed

People

(Reporter: Alex_Gaynor, Assigned: Alex_Gaynor)

References

Details

(Keywords: sec-want, Whiteboard: [post-critsmash-triage][adv-main68-])

Crash Data

Attachments

(1 file)

Right now it's a MOZ_ASSERT. This means that if you forget to call isValid(), in a release build you'll get an overflowed value, leading to sadness.

The good news is that since all correct callers should already be checking isValid(), the compiler should be able to easily remove the MOZ_RELEASE_ASSERT -- no performance regression for incorrect code, extra safety for incorrect code! Huzzah!

Keywords: checkin-needed
Group: core-security → core-security-release
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla67

Thanks!

Depends on: 1534893

I believe that pushing this in the freeze zone is dangerous.

the assumption that all caller would be doing a check first is obviously incorrect.

Flags: needinfo?(agaynor)
Flags: needinfo?(nfroyd)

(In reply to Alex Gaynor [:Alex_Gaynor] from comment #0)

The good news is that since all correct callers should already be checking isValid(), the compiler should be able to easily remove the MOZ_RELEASE_ASSERT -- no performance regression for incorrect code, extra safety for incorrect code! Huzzah!

I seriously doubt that this statement is correct even if the caller was doing the right thing.

Backed out changeset 2f2a20c16289 (bug 1533777) for crashing youtube on request by jya

Backout: https://hg.mozilla.org/mozilla-central/rev/9067457d7dcf72804288f61fc18326f31b2fc140

Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: mozilla67 → ---
Crash Signature: [@ mozilla::CheckedInt<T>::value ]
Group: core-security-release → core-security

(In reply to Jean-Yves Avenard [:jya] from comment #5)

the assumption that all caller would be doing a check first is obviously incorrect.

Doing the check before calling value() is the documented way of using this class. If callers aren't following the contract, they should be fixed. What would value() contain if isValid() is false? (I see that we have made it contain 0 in many cases, which has probably encouraged people to rely on incorrect API usage.)

Flags: needinfo?(nfroyd)

(In reply to Nathan Froyd [:froydnj] from comment #8)

(In reply to Jean-Yves Avenard [:jya] from comment #5)

the assumption that all caller would be doing a check first is obviously incorrect.

Doing the check before calling value() is the documented way of using this class. If callers aren't following the contract, they should be fixed. What would value() contain if isValid() is false? (I see that we have made it contain 0 in many cases, which has probably encouraged people to rely on incorrect API usage.)

Absolutely. It should be fixed.

However, this should go after 67 has merged to beta, so we have time fixing the invalid use without rushing.

I also agree with your review comment that a DIAGNOSTIC_ASSERT would have been preferred.

(In reply to Jean-Yves Avenard [:jya] from comment #9)

(In reply to Nathan Froyd [:froydnj] from comment #8)

(In reply to Jean-Yves Avenard [:jya] from comment #5)

the assumption that all caller would be doing a check first is obviously incorrect.

Doing the check before calling value() is the documented way of using this class. If callers aren't following the contract, they should be fixed. What would value() contain if isValid() is false? (I see that we have made it contain 0 in many cases, which has probably encouraged people to rely on incorrect API usage.)

Absolutely. It should be fixed.

However, this should go after 67 has merged to beta, so we have time fixing the invalid use without rushing.

We are in violent agreement.

I also agree with your review comment that a DIAGNOSTIC_ASSERT would have been preferred.

I think Alex's explanation of why the compiler should be able to optimize it away is a correct assessment of the situation. OTOH, I suppose it's possible that for something like media's TimeUnit, you could have a collection of TimeUnits where some caller has ensured that all members of the collection are isValid() and therefore you can call value() when manipulating the collection without penalty. (Or other similar scenarios.) I think that such a scenario is still analogous to why we unconditionally turn on array bounds checking. Maybe we could provide escape hatches, but I'd want to see some performance numbers first.

Yup -- landing during the freeze period was a mistake, sorry about that.

There are 2 crashes with signature 'mozilla::media::TimeUnit::operator>=' and the moz_crash_reason is:
MOZ_RELEASE_ASSERT(mIsValid) (Invalid checked integer (division by zero or integer overflow))

Crash Signature: [@ mozilla::CheckedInt<T>::value ] → [@ mozilla::CheckedInt<T>::value ] [@ mozilla::media::TimeUnit::operator>=]

Looks like :jya fixed at least one of them -- are these all the same root cause, or do they need their own fixes?

(In reply to Alex Gaynor [:Alex_Gaynor] from comment #13)

Looks like :jya fixed at least one of them -- are these all the same root cause, or do they need their own fixes?

so far they are different.
The one I fixed will likely have significant number by tomorrow, it makes YouTube crash as soon as you seek. And YT is over 65% of what people spend time on with a web browser!

When I made mEnd >= mStart a MOZ_DIAGNOSTIC_ASSERT in Intervals.h (bug 1530322) (most of them using TimeUnits), it took over 2 weeks to eradicate them all, and there's still 2-3 crashes daily for remaining issue.

The problem here, in all these case, we're dealing with external user data; and while the vast majority of all cases are covered there are still some.

Some sites in particular use EPOCH time measured in microseconds and then they play with media offsets. that's a lot of cases to handle.

Eeep! Is it possible to write tests for these cases, so we (I) won't accidentally break them again?

(In reply to Alex Gaynor [:Alex_Gaynor] from comment #15)

Eeep! Is it possible to write tests for these cases, so we (I) won't accidentally break them again?

for those solved in those past few weeks, I have opened 1531991.

I will ping the fuzzing team to re-do a run targeting that code

Depends on: 1534993
Group: core-security → core-security-release

:jya, since bug 1534993 is still open I'm not planning on relanding this. Do you expect to be able to get to that bug sometime early in the cycle?

Flags: needinfo?(agaynor) → needinfo?(jyavenard)

I'd be okay to push that in now, we have enough time in the cycle to look into it.

Flags: needinfo?(jyavenard)

Thanks!

Keywords: checkin-needed
Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla68

Hi Alex, since 67 is marked as affected, should we consider uplifting this to Beta67? If yes, please nominate a patch for uplift.

Flags: needinfo?(agaynor)

(In reply to Ritu Kothari (:ritu) from comment #23)

Hi Alex, since 67 is marked as affected, should we consider uplifting this to Beta67? If yes, please nominate a patch for uplift.

please don't.

Yes, like :jya said, this should not be uplifted.

Flags: needinfo?(agaynor)

There 26 crashes with signature 'mozilla::CheckedInt<T>::value' in nightly 68 starting with buildid 20190329131714.
The moz_crash_reason is always MOZ_RELEASE_ASSERT(mIsValid) (Invalid checked integer (division by zero or integer overflow)).

Status: RESOLVED → REOPENED
Resolution: FIXED → ---

Finding those issues is the expected result of this -- we should file bugs blocking this for those (e.g. bug 1534893), I don't think reopening this bug is correct.

Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → FIXED
Depends on: 1540740
Depends on: 1540748
Depends on: 1540746
Depends on: 1540744

I've filed bugs for all of the assertions that have shown up on Nightly so far, splitting them based on the top most "interesting" non-generic library frame. Of note, they are all on Android, and all in media code. It is also possible there are other crashes with different signatures, but I haven't been able to get the search on raw crash reason to work.

Ok, I did get the raw crash reason to work. I think I had the product set to Firefox, which excluded Android. Every crash where the raw reason contains "mIsValid" does have the signature mozilla::CheckedInt<T>::value.

Flags: qe-verify-
Whiteboard: [post-critsmash-triage]
No longer depends on: 1540746
Regressions: 1540746
Depends on: 1548409
Whiteboard: [post-critsmash-triage] → [post-critsmash-triage][adv-main68+]
Whiteboard: [post-critsmash-triage][adv-main68+] → [post-critsmash-triage][adv-main68-]
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: