Change assert in CheckedInt::value to a MOZ_RELEASE_ASSERT
Categories
(Core :: MFBT, defect, P2)
Tracking
()
People
(Reporter: Alex_Gaynor, Assigned: Alex_Gaynor)
References
Details
(Keywords: sec-want, Whiteboard: [post-critsmash-triage][adv-main68-])
Crash Data
Attachments
(1 file)
Right now it's a MOZ_ASSERT. This means that if you forget to call isValid(), in a release build you'll get an overflowed value, leading to sadness.
The good news is that since all correct callers should already be checking isValid(), the compiler should be able to easily remove the MOZ_RELEASE_ASSERT -- no performance regression for incorrect code, extra safety for incorrect code! Huzzah!
| Assignee | ||
Comment 1•6 years ago
|
||
| Assignee | ||
Updated•6 years ago
|
Comment 2•6 years ago
|
||
Comment 3•6 years ago
|
||
Comment 4•6 years ago
|
||
Thanks!
Updated•6 years ago
|
Comment 5•6 years ago
|
||
I believe that pushing this in the freeze zone is dangerous.
the assumption that all caller would be doing a check first is obviously incorrect.
Updated•6 years ago
|
Updated•6 years ago
|
Comment 6•6 years ago
|
||
(In reply to Alex Gaynor [:Alex_Gaynor] from comment #0)
The good news is that since all correct callers should already be checking
isValid(), the compiler should be able to easily remove theMOZ_RELEASE_ASSERT-- no performance regression for incorrect code, extra safety for incorrect code! Huzzah!
I seriously doubt that this statement is correct even if the caller was doing the right thing.
Comment 7•6 years ago
|
||
Backed out changeset 2f2a20c16289 (bug 1533777) for crashing youtube on request by jya
Backout: https://hg.mozilla.org/mozilla-central/rev/9067457d7dcf72804288f61fc18326f31b2fc140
Updated•6 years ago
|
Updated•6 years ago
|
Comment 8•6 years ago
|
||
(In reply to Jean-Yves Avenard [:jya] from comment #5)
the assumption that all caller would be doing a check first is obviously incorrect.
Doing the check before calling value() is the documented way of using this class. If callers aren't following the contract, they should be fixed. What would value() contain if isValid() is false? (I see that we have made it contain 0 in many cases, which has probably encouraged people to rely on incorrect API usage.)
Comment 9•6 years ago
•
|
||
(In reply to Nathan Froyd [:froydnj] from comment #8)
(In reply to Jean-Yves Avenard [:jya] from comment #5)
the assumption that all caller would be doing a check first is obviously incorrect.
Doing the check before calling
value()is the documented way of using this class. If callers aren't following the contract, they should be fixed. What wouldvalue()contain ifisValid()is false? (I see that we have made it contain 0 in many cases, which has probably encouraged people to rely on incorrect API usage.)
Absolutely. It should be fixed.
However, this should go after 67 has merged to beta, so we have time fixing the invalid use without rushing.
I also agree with your review comment that a DIAGNOSTIC_ASSERT would have been preferred.
Comment 10•6 years ago
|
||
(In reply to Jean-Yves Avenard [:jya] from comment #9)
(In reply to Nathan Froyd [:froydnj] from comment #8)
(In reply to Jean-Yves Avenard [:jya] from comment #5)
the assumption that all caller would be doing a check first is obviously incorrect.
Doing the check before calling
value()is the documented way of using this class. If callers aren't following the contract, they should be fixed. What wouldvalue()contain ifisValid()is false? (I see that we have made it contain 0 in many cases, which has probably encouraged people to rely on incorrect API usage.)Absolutely. It should be fixed.
However, this should go after 67 has merged to beta, so we have time fixing the invalid use without rushing.
We are in violent agreement.
I also agree with your review comment that a DIAGNOSTIC_ASSERT would have been preferred.
I think Alex's explanation of why the compiler should be able to optimize it away is a correct assessment of the situation. OTOH, I suppose it's possible that for something like media's TimeUnit, you could have a collection of TimeUnits where some caller has ensured that all members of the collection are isValid() and therefore you can call value() when manipulating the collection without penalty. (Or other similar scenarios.) I think that such a scenario is still analogous to why we unconditionally turn on array bounds checking. Maybe we could provide escape hatches, but I'd want to see some performance numbers first.
| Assignee | ||
Comment 11•6 years ago
|
||
Yup -- landing during the freeze period was a mistake, sorry about that.
Comment 12•6 years ago
|
||
There are 2 crashes with signature 'mozilla::media::TimeUnit::operator>=' and the moz_crash_reason is:
MOZ_RELEASE_ASSERT(mIsValid) (Invalid checked integer (division by zero or integer overflow))
| Assignee | ||
Comment 13•6 years ago
|
||
Looks like :jya fixed at least one of them -- are these all the same root cause, or do they need their own fixes?
Comment 14•6 years ago
|
||
(In reply to Alex Gaynor [:Alex_Gaynor] from comment #13)
Looks like :jya fixed at least one of them -- are these all the same root cause, or do they need their own fixes?
so far they are different.
The one I fixed will likely have significant number by tomorrow, it makes YouTube crash as soon as you seek. And YT is over 65% of what people spend time on with a web browser!
When I made mEnd >= mStart a MOZ_DIAGNOSTIC_ASSERT in Intervals.h (bug 1530322) (most of them using TimeUnits), it took over 2 weeks to eradicate them all, and there's still 2-3 crashes daily for remaining issue.
The problem here, in all these case, we're dealing with external user data; and while the vast majority of all cases are covered there are still some.
Some sites in particular use EPOCH time measured in microseconds and then they play with media offsets. that's a lot of cases to handle.
| Assignee | ||
Comment 15•6 years ago
|
||
Eeep! Is it possible to write tests for these cases, so we (I) won't accidentally break them again?
Comment 16•6 years ago
|
||
(In reply to Alex Gaynor [:Alex_Gaynor] from comment #15)
Eeep! Is it possible to write tests for these cases, so we (I) won't accidentally break them again?
for those solved in those past few weeks, I have opened 1531991.
I will ping the fuzzing team to re-do a run targeting that code
Updated•6 years ago
|
| Assignee | ||
Comment 17•6 years ago
|
||
:jya, since bug 1534993 is still open I'm not planning on relanding this. Do you expect to be able to get to that bug sometime early in the cycle?
Comment 18•6 years ago
|
||
I'd be okay to push that in now, we have enough time in the cycle to look into it.
Comment 20•6 years ago
|
||
Comment 21•6 years ago
|
||
Backed out because arc changed the author when it got pulled and pushed to be able to land it again with Lando: https://hg.mozilla.org/integration/autoland/rev/07595d5c98d2137f0a7dd4334b0c315988b8642e
Re-landed: https://hg.mozilla.org/integration/autoland/rev/16f19322ec762261a5abe9e70c935e6e6bf90582
Comment 22•6 years ago
|
||
Hi Alex, since 67 is marked as affected, should we consider uplifting this to Beta67? If yes, please nominate a patch for uplift.
Comment 24•6 years ago
|
||
(In reply to Ritu Kothari (:ritu) from comment #23)
Hi Alex, since 67 is marked as affected, should we consider uplifting this to Beta67? If yes, please nominate a patch for uplift.
please don't.
| Assignee | ||
Comment 25•6 years ago
|
||
Yes, like :jya said, this should not be uplifted.
Thanks, with that I'll mark this wontfix for 67.
Comment 27•6 years ago
|
||
There 26 crashes with signature 'mozilla::CheckedInt<T>::value' in nightly 68 starting with buildid 20190329131714.
The moz_crash_reason is always MOZ_RELEASE_ASSERT(mIsValid) (Invalid checked integer (division by zero or integer overflow)).
| Assignee | ||
Comment 28•6 years ago
|
||
Finding those issues is the expected result of this -- we should file bugs blocking this for those (e.g. bug 1534893), I don't think reopening this bug is correct.
Comment 30•6 years ago
|
||
I've filed bugs for all of the assertions that have shown up on Nightly so far, splitting them based on the top most "interesting" non-generic library frame. Of note, they are all on Android, and all in media code. It is also possible there are other crashes with different signatures, but I haven't been able to get the search on raw crash reason to work.
Comment 31•6 years ago
|
||
Ok, I did get the raw crash reason to work. I think I had the product set to Firefox, which excluded Android. Every crash where the raw reason contains "mIsValid" does have the signature mozilla::CheckedInt<T>::value.
Updated•6 years ago
|
Updated•6 years ago
|
Updated•6 years ago
|
Updated•6 years ago
|
Updated•5 years ago
|
Description
•