Open Bug 279923 Opened 20 years ago Updated 2 years ago

Assertions should be fatal

Categories

(Firefox Build System :: General, defect, P3)

defect

Tracking

(Not tracked)

People

(Reporter: brendan, Unassigned)

References

(Depends on 5 open bugs, )

Details

We should fix the bogus assertions that would turn debug-build tinderboxes
orange were we to make them fatal.

We may need to change some to NS_ENSURE_* (or add runtime checks afterward, if
local style eschews NS_ENSURE and its hidden return).

We may need to remove some assertions that can't be fixed.

This bug is meant to track other specific bugs on individual bogus assertions?

/be
See Bug 203724 "NS_PRECONDITION: Truth in advertising (2)"
No longer depends on: 282653
We need to make sure NS_WARN_IF_FALSE() is actually a warning rather than an
alias for NS_ASSERTION(). Or get rid of it, but it was never intended to assert.
Depends on: 285519
Status: NEW → ASSIGNED
Priority: -- → P3
Target Milestone: --- → mozilla1.9alpha
Time to start flipping this on for tinderboxes.  Which ones first?

/be
It's been done on the *one* debug tinderbox that has tests enabled, balsa on
http://tinderbox.mozilla.org/showbuilds.cgi?tree=Firefox .  I'll try
uncommenting the one assertion that I had to comment out; it may have since gone
away.
Will and Mac or Windows tinderboxen have this change in the near future?  (I'm hoping so, because Mac asserts a lot on startup and shutdown, and it's been getting worse over the months.)
I need help with this -- Rob, can you steal the bug and make sure we have tinderbox coverage for tier 1 platforms with assertions fatal (XPCOM_DEBUG_BREAK set to abort in the environment)? Thanks,

/be
(Note that those bugs do have dependencies -- probably somewhat inaccurate by now -- but I suspect if we had tinderboxes running and orange, we could get the necessary fixes to turn them green in within a few days.)
Can we make the assertions fatal right now EXCEPT for the ones we know we currently hit?  Holding the tree hostage for those fixes seems unnecessary, but new assertion failures are going to continue to be introduced until new assertions are actually made fatal.
If we had a tinderbox (which we do for Linux), we could probably change the few we hit into warnings, temporarily, or add other workarounds.  Want to go through the log and comment on the appropriate bug?

Making them fatal on tinderbox is a first step; we need to address other common ones that tinderbox doesn't hit as well before making them fatal for everyone.
Depends on: 385276
Depends on: 372581
Depends on: 404077
Target Milestone: mozilla1.9alpha1 → ---
Could we get ourselves on the right trajectory sooner just by making Tinderbox/talos count the number of assertions during a build so that at least we can notice when new assertions get added and can back them out?
(In reply to comment #11)
> Could we get ourselves on the right trajectory sooner just by making
> Tinderbox/talos count the number of assertions during a build so that at least
> we can notice when new assertions get added and can back them out?

That won't trigger if new code adds a new assertion, and removes another old one in the process, but I'm sure this is (hopefully) easy to overcome.
I just checked the Firefox leak boxes, the only remaining assertion at the moment is bug 459666, (the fix for which is in bug 458453 which is waiting on bug 455314).

Once that bug is fixed, we should be able to switch the flag on the leak test tinderboxes. Is there any work to be done on this? i.e. if we want to turn orange rather than red?
Depends on: 459666
Depends on: 463681
No longer depends on: 459666
(In reply to comment #13)
> I just checked the Firefox leak boxes, the only remaining assertion at the
> moment is bug 459666, (the fix for which is in bug 458453 which is waiting on
> bug 455314).
> 
> Once that bug is fixed, we should be able to switch the flag on the leak test
> tinderboxes. Is there any work to be done on this? i.e. if we want to turn
> orange rather than red?

All remaining assertions seem to be fixed based on comment #13, nominating wanted1.9.1? to hopefully turn this on before Fx3.1/5 is released. 

Don't know if it's that important to warrant blocking (thus the wanted flag), but assertions have been known to eventually expose deeper, wider, sometimes major security problems, esp bug 355829 and bug 369696, we would be ignoring them at our own peril (if we do choose to ignore, which we hopefully don't).
Flags: wanted1.9.1?
Bug 463681, which I think is what you're asking for (at least based on the text you quoted from comment 13), was fixed a while ago, but we're not yet ready for fatal assertions in all tests.

The next step in making assertions fatal-in-general is bug 472557 (and then doing the same for mochitest), which in turn depends on bug 372581, and I haven't been able to convince anyone to make that a priority despite its importance for security (as you mentioned in comment 14).
QA Contact: build-config
Taking at Brendan's request, since I'm working on it somewhat more actively than he is, although I'm not expecting it to be fixed in the very near future either.
Assignee: brendan → dbaron
Depends on: 694517
Depends on: 847275
Depends on: 850035
Depends on: 854195
Depends on: 820466
Depends on: 854196
So after living with the world created by bug 472557, bug 404077, and hopefully soon bug 847275, which were originally designed as a mechanism to help us on the road towards making assertions fatal, I've actually started to reconsider whether making them fatal is actually the best approach.

In particular, with our automated tests, it's useful to get complete test runs so that one has the complete list of test failures.  I rather dislike compiler warnings-as-errors because bustage from it causes (often platform-specific) compilation failures, which means no test runs for that build.  I'd rather it caused orange and let the tests continue.  I'm happier with the current unit testing situation (especially in terms of tree management, i.e., dealing with backing out things that break the tree) where assertions cause orange but allow the test runs to continue.  For tree management, in general, I think it's preferable for as few test failures to be fatal (i.e., cause later tests not to run) as possible.

I think I feel roughly the same way as part of a development process.  Given how long our builds and test runs take, I frequently start them and go do something else; it's annoying to discover they failed quickly, and preferable to have a complete list of the failures (e.g., compiler warnings, assertions).

That said, there's certainly a tradeoff in terms of being able to miss failures that one should have noticed sooner.  I don't see an obvious fix for that, although it's possible that we could have assertions fatal-by-default, but NS_ASSERTION keep its current behavior inside unit test runs that mark unexpected assertions as failures.  That's rather ugly, though.

On the other hand, with the number of existing NS_ASSERTIONs that fire, this is all theoretical for now.
With expected NS_ASSERTION counting as failure in most tests (including fuzzing), and key safety invariants using the fatal MOZ_ASSERT instead of NS_ASSERTION, I think we're in pretty good shape.
(In reply to David Baron [:dbaron] (UTC-8) (needinfo? for questions) from comment #17)
> In particular, with our automated tests, it's useful to get complete test
> runs so that one has the complete list of test failures.

This is one of my top annoyances with our CI infrastructure as well. It would be greatly preferable if a crash in a certain test would cause the test harness to simply start a new Firefox process and restart at the next test. (Presumably with some sort of limit to the number of times we do this - if we're crashing in *every* test, at a certain point we probably just want to fail.)
That's what the web-platform-tests runs already do; I expect us to move further in that direction. This is getting rather off-topic for this bug, though; please file another bug.
Product: Core → Firefox Build System
Assignee: dbaron → nobody
Status: ASSIGNED → NEW
Severity: normal → S3

The severity field for this bug is relatively low, S3. However, the bug has 50 CCs.
:ahochheiden, could you consider increasing the bug severity?

For more information, please visit auto_nag documentation.

Flags: needinfo?(ahochheiden)

The last needinfo from me was triggered in error by recent activity on the bug. I'm clearing the needinfo since this is a very old bug and I don't know if it's still relevant.

Flags: needinfo?(ahochheiden)
You need to log in before you can comment on or make changes to this bug.