Investigate adding non-debug mode bounds checking to nsTArray

RESOLVED FIXED in Firefox 50

Status

()

RESOLVED FIXED
4 years ago
2 years ago

People

(Reporter: Ehsan, Assigned: Nika)

Tracking

({sec-want})

unspecified
mozilla51
sec-want
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox50 fixed, firefox51 fixed)

Details

(Whiteboard: [adv-main50-])

Attachments

(2 attachments, 10 obsolete attachments)

2.14 KB, text/plain
Details
4.08 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
(Reporter)

Description

4 years ago
Right now, if the caller passes a garbage index to nsTArray, it will happily read past the bounds of the internal buffer.  It would be interesting to investigate the runtime overhead of adding non-debug mode bounds checking to see if that is something which we can turn on for our production builds.

If we find that it is too expensive in a few places, we may want to add unsafe variants of some functions for perf-sensitive code.
Isn't this what nsTArray::SafeElementAt() is for?

Also, what do you propose doing in production builds if ElementAt() does go out-of-bounds?  MOZ_CRASH?
(In reply to Ben Kelly [:bkelly] from comment #1)
> Isn't this what nsTArray::SafeElementAt() is for?
It would be better to be safe-by-default, and opt into unsafe indexing where it is necessary for performance.  Some kind of UnsafeElementAt() would also make reviewers more cautious.

> Also, what do you propose doing in production builds if ElementAt() does go
> out-of-bounds?  MOZ_CRASH?
Yes, that's pretty standard.
(Reporter)

Comment 3

4 years ago
(In reply to Andrew McCreight [:mccr8] from comment #2)
> (In reply to Ben Kelly [:bkelly] from comment #1)
> > Isn't this what nsTArray::SafeElementAt() is for?
> It would be better to be safe-by-default, and opt into unsafe indexing where
> it is necessary for performance.  Some kind of UnsafeElementAt() would also
> make reviewers more cautious.

Yeah.  Also note that there are other methods that access stuff within the array without any bounds checks as well.

> > Also, what do you propose doing in production builds if ElementAt() does go
> > out-of-bounds?  MOZ_CRASH?
> Yes, that's pretty standard.

Yes, indeed.  Those are bugs that we need to catch and fix -- we should not attempt to just recover from them by returning nullptr.  The goal here is to make those unexploitable crashes.
(Reporter)

Updated

4 years ago
Assignee: nobody → michael
(Assignee)

Comment 4

4 years ago
Created attachment 8605866 [details] [diff] [review]
Part 1: Add non debug mode bounds checking to nsTArray

I'm running talos tests on try to see if there is any significant (easily detectable) perf. difference:

Before patch - https://treeherder.mozilla.org/#/jobs?repo=try&revision=223faa9f5319
After patch - https://treeherder.mozilla.org/#/jobs?repo=try&revision=5c0ce7024cd0
If you want to compare Talos pushes:

- push a baseline to try with '-b o -p linux64 -u none -t none -awsy nstarrayboundschecks'
- push bounds check patch to try with '-b o -p linux64 -u none -t none -awsy nstarrayboundschecks'
- eventually view results at https://areweslimyet.com/?series=nstarrayboundschecks&evenspacing=1'

(from bug 819090 comment 20)
(Reporter)

Comment 6

4 years ago
(In reply to Birunthan Mohanathas [:poiru] from comment #5)
> If you want to compare Talos pushes:
> 
> - push a baseline to try with '-b o -p linux64 -u none -t none -awsy
> nstarrayboundschecks'
> - push bounds check patch to try with '-b o -p linux64 -u none -t none -awsy
> nstarrayboundschecks'
> - eventually view results at
> https://areweslimyet.com/?series=nstarrayboundschecks&evenspacing=1'

Isn't that only useful for memory usage concerns though?  This change should not affect the memory usage, as far as I can tell.
Flags: needinfo?(birunthan)
(In reply to :Ehsan Akhgari (not reading bugmail, needinfo? me!) from comment #6)
> Isn't that only useful for memory usage concerns though?  This change should
> not affect the memory usage, as far as I can tell.

Gah, of course. Sorry, I will pay more attention in the future :)
Flags: needinfo?(birunthan)
(Assignee)

Comment 8

4 years ago
Created attachment 8616152 [details] [diff] [review]
Add non-debug mode bounds checking to nsTArray

New patch which annotates the crash reports caused by bounds checking on nsTArray.
Attachment #8605866 - Attachment is obsolete: true
Attachment #8616152 - Flags: feedback?(ehsan)
(Assignee)

Comment 9

4 years ago
Created attachment 8616692 [details] [diff] [review]
Add non-debug mode bounds checking to nsTArray
Attachment #8616152 - Attachment is obsolete: true
Attachment #8616152 - Flags: feedback?(ehsan)
Attachment #8616692 - Flags: feedback?(ehsan)
(Reporter)

Comment 10

4 years ago
Comment on attachment 8616692 [details] [diff] [review]
Add non-debug mode bounds checking to nsTArray

Review of attachment 8616692 [details] [diff] [review]:
-----------------------------------------------------------------

Looks great to me!

::: xpcom/glue/nsTArray.cpp
@@ +12,5 @@
>  
> +// We can only use the crash reporter when we are linking directly with libxul
> +// and the crash reporter is enabled
> +#if defined(MOZ_CRASHREPORTER) && defined(MOZILLA_INTERNAL_API) && \
> +  !defined(MOZILLA_EXTERNAL_LINKAGE)

In order to not repeat this sequence below, you can #define a macro here, and make the inclusion of nsExceptionHandler, and the code block below depend on that instead.
Attachment #8616692 - Flags: feedback?(ehsan) → feedback+
(Assignee)

Comment 11

4 years ago
Created attachment 8616719 [details] [diff] [review]
Add non-debug mode bounds checking to nsTArray

Made #if-s less repetitive
Attachment #8616692 - Attachment is obsolete: true
(Assignee)

Updated

4 years ago
Attachment #8616719 - Flags: feedback?(dmajor)
Comment on attachment 8616719 [details] [diff] [review]
Add non-debug mode bounds checking to nsTArray

Review of attachment 8616719 [details] [diff] [review]:
-----------------------------------------------------------------

I think this is a good start. I'd recommend elaborating the messages to make it more clear on a crash report what's going on, e.g.
"invalid array index" -> "Accessing invalid array index"
"invalid aStart index" -> "Removing at invalid aStart index"
etc.

I suspect that once the crash reports start coming in, you'll find yourself hungry for more information. Why did this crash happen? Was aIndex too large by one? By ten? Did it underflow to 0xffffffff? Was it the poison value 0x5a5a5a5a?

Usually in those situations, people set a needinfo flag for me to dig through minidumps and find those values, but that's a slow process. Perhaps we could automate this by doing a printf-style message, like maybe:
NSTARRAY_BOUNDS_ASSERT(aIndex < Length(), "Accessing invalid index %u, length %u", aIndex, Length());

What do y'all think?
Attachment #8616719 - Flags: feedback?(dmajor) → feedback+
(Reporter)

Comment 13

4 years ago
I think that's a great idea!
(Assignee)

Comment 14

4 years ago
Created attachment 8616860 [details] [diff] [review]
Add non-debug mode bounds checking to nsTArray

Updated error message to include the values which caused the bound check failure.

Do you like the new error messages?
Attachment #8616719 - Attachment is obsolete: true
Attachment #8616860 - Flags: feedback?(dmajor)
Comment on attachment 8616860 [details] [diff] [review]
Add non-debug mode bounds checking to nsTArray

Review of attachment 8616860 [details] [diff] [review]:
-----------------------------------------------------------------

Nice!

You probably don't need to mention "nsTArray boundcheck" since that's already in the crash annotation key.
Attachment #8616860 - Flags: feedback?(dmajor) → feedback+
(Assignee)

Comment 16

4 years ago
Created attachment 8616941 [details] [diff] [review]
Add non-debug mode bounds checking to nsTArray

I'm not sure if there is anything more I should do with the patch before we try to land it.

try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=f59d5ce609cd
Attachment #8616860 - Attachment is obsolete: true
Attachment #8616941 - Flags: review?(ehsan)
(Reporter)

Comment 17

4 years ago
Comment on attachment 8616941 [details] [diff] [review]
Add non-debug mode bounds checking to nsTArray

Review of attachment 8616941 [details] [diff] [review]:
-----------------------------------------------------------------

This looks good, but Nathan should do the final review, I think.

Do you have Talos measurements?  Did this move any needles?

::: xpcom/glue/nsTArray.cpp
@@ +40,5 @@
> +  // We don't use MOZ_CRASH, because we don't have the error message as a string literal,
> +  // instead it has been passed in as a const char*
> +  MOZ_ReportCrash(aMessage, aFilename, aLine);
> +  MOZ_REALLY_CRASH();
> +}

Nit: for hygiene purposes (because of unified builds), please #undef at the end of the file.

::: xpcom/glue/nsTArray.h
@@ +18,5 @@
>  #include "mozilla/ReverseIterator.h"
>  #include "mozilla/TypeTraits.h"
>  
>  #include <string.h>
> +#include <inttypes.h>

Please include "mozilla/IntegerPrintfMacros.h" instead (see the comments in that header for why.)

@@ +99,5 @@
> +
> +#define NSTARRAY_BOUNDS_ASSERT(expr, fmt_str, ...)                      \
> +  if (MOZ_UNLIKELY(!(expr))) {                                          \
> +    char msg[1024];                                                     \
> +    snprintf(msg, sizeof(msg), fmt_str, __VA_ARGS__); \

Nit: whitespace
Attachment #8616941 - Flags: review?(nfroyd)
Attachment #8616941 - Flags: review?(ehsan)
Attachment #8616941 - Flags: feedback+
Comment on attachment 8616941 [details] [diff] [review]
Add non-debug mode bounds checking to nsTArray

Review of attachment 8616941 [details] [diff] [review]:
-----------------------------------------------------------------

I don't have any problems with the patch per se, though you should probably redo your try push from Monday, since it looks like it got caught in infra problems, or something.  Building on all platforms and testing on one is good practice for patches like these:

-b do -p all -u all[x64] -t none

will only test on x86-64 Linux.

I will reiterate Ehsan's point about Talos measurements, or really, measurements of any kind, because I don't see any reviewing the history of the bug.  Do you have any measurements on a possible performance hit from this change?

r=me with Ehsan's comments and the below comments addressed, but landing this is conditional on performance measurements.

::: xpcom/glue/nsTArray.cpp
@@ +7,5 @@
>  #include <string.h>
>  #include "nsTArray.h"
>  #include "nsXPCOM.h"
>  #include "nsDebug.h"
>  #include "mozilla/CheckedInt.h"

Please explicitly include "mozilla/Assertions.h" here.

::: xpcom/glue/nsTArray.h
@@ +99,5 @@
> +
> +#define NSTARRAY_BOUNDS_ASSERT(expr, fmt_str, ...)                      \
> +  if (MOZ_UNLIKELY(!(expr))) {                                          \
> +    char msg[1024];                                                     \
> +    snprintf(msg, sizeof(msg), fmt_str, __VA_ARGS__); \

Please include "mozilla/Snprintf.h" and use snprintf_literal here.
Attachment #8616941 - Flags: review?(nfroyd) → review+
(Assignee)

Comment 19

4 years ago
Created attachment 8624832 [details] [diff] [review]
Add non-debug mode bounds checking to nsTArray

I'm not sure how this slipped under my radar, but I made these changes when you submitted your response and never pushed them >.>.

I'll have talos measurements up on here by the end of the day.
Attachment #8616941 - Attachment is obsolete: true
(Assignee)

Comment 20

4 years ago
I appear to be causing some "hazard"s, which, as far as I can tell, are due to the increased complexity of the nsTArray indexing operations - it seems as though some static/runtime analysis tool has noticed that an unrooted javascript reference could be kept alive over the call to ArrayBoundsCheckFailed, which theoretically could trigger a GC?

Here's the try log: http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/michael@thelayzells.com-6410745db362/try-linux64-br-haz/linux64-br-haz_try_dep-bm76-try1-build153.txt.gz, and the hazard log: https://ftp-ssl.mozilla.org/pub/mozilla.org/firefox/try-builds/michael@thelayzells.com-6410745db362/try-linux64-br-haz/hazards.txt.gz

Any suggestions about how I could get around this (as I'm pretty sure that there is no way ArrayBoundsCheckFailed could trigger a GC, unless the crash reporter can trigger GCs, and if it did, then the program will crash immediately before any pointers could be dereferenced).
Flags: needinfo?(ehsan)
I'm guessing the hazard is because the analysis can't tell that ArrayBoundsCheckFailed will never return. I'm not sure if there's some annotation you can add to let the analysis know that. sfink might have some ideas.
(Assignee)

Comment 22

4 years ago
I could pull the MOZ_REALLY_CRASH out of the ArrayBoundsCheckFailed call and into the caller (immediately after the call to ArrayBoundsCheckFailed)? I'm not sure if that would make the analysis happy (as I'm not sure how it works).

Marking ArrayBoundsCheckFailed as never returning would probably be better, I'll look at the MFBT annotations to see if I can find one.
(In reply to Michael Layzell [:mystor] from comment #22)
> I could pull the MOZ_REALLY_CRASH out of the ArrayBoundsCheckFailed call and
> into the caller (immediately after the call to ArrayBoundsCheckFailed)? I'm
> not sure if that would make the analysis happy (as I'm not sure how it
> works).

I don't think it would. Even if I extended the analysis to understand noreturn functions, the code in the current patch would be fine.

> Marking ArrayBoundsCheckFailed as never returning would probably be better,
> I'll look at the MFBT annotations to see if I can find one.

Sadly, it wouldn't help anyway. The analysis doesn't understand noreturn functions. Or least, I don't think it does -- it's possible that the information ends up getting used for the control flow graph that the analysis sees. That would be a fairly easy thing to add -- it would just be a matter of killing all variables' live ranges if you ever called a noreturn function -- but given that this is the first time I've seen it needed, I'm inclined to just annotate this away.

It would probably be safe to claim that CrashReporter::AnnotateCrashReport never GCs, but I don't know that for 100% sure -- for all I know, there could be cases where we generate a crash report but don't actually crash. So for now, I'd go conservative and just lie and claim that ArrayBoundsCheckFailed can never GC.

I have messy stuff in my working directory at the moment, but if you add it to ignoreFunctions in js/src/devtools/rootAnalysis/annotations.js, it'll be fine. The patch will look like:

diff --git a/js/src/devtools/rootAnalysis/annotations.js b/js/src/devtools/rootAnalysis/annotations.js
--- a/js/src/devtools/rootAnalysis/annotations.js
+++ b/js/src/devtools/rootAnalysis/annotations.js
@@ -184,16 +184,19 @@ var ignoreFunctions = {
     // Similar to heap snapshot mock classes, and GTests below. This posts a
     // synchronous runnable when a GTest fails, and we are pretty sure that the
     // particular runnable it posts can't even GC, but the analysis isn't
     // currently smart enough to determine that. In either case, this is (a)
     // only in GTests, and (b) only when the Gtest has already failed. We have
     // static and dynamic checks for no GC in the non-test code, and in the test
     // code we fall back to only the dynamic checks.
     "void test::RingbufferDumper::OnTestPartResult(testing::TestPartResult*)" : true,
+
+    // This function will not return so any GCs it does are irrelevant.
+    "void ArrayBoundsCheckFailed(int8*, int8*, int32)": true,
 };
 
 function isProtobuf(name)
 {
     return name.match(/\bgoogle::protobuf\b/) ||
            name.match(/\bmozilla::devtools::protobuf\b/);
 }
 

and I can review it. Maybe I'll look into querying attributes, but it'll require plumbing, and there's no need for you to be blocked on it.
(Reporter)

Comment 24

4 years ago
I think Steve answered the question.  :-)  Clearing the ni.
Flags: needinfo?(ehsan)
(Assignee)

Comment 25

4 years ago
Created attachment 8626186 [details] [diff] [review]
Add non-debug mode bounds checking to nsTArray

Thanks. I decided to mark the function as MOZ_NORETURN anyways, because, well, it doesn't return.

Here's a try for the hazards (hopefully they're gone now): https://treeherder.mozilla.org/#/jobs?repo=try&revision=de2a3b3da818

Also, hopefully I should have the performace measurements in a few hours (finally).
Attachment #8624832 - Attachment is obsolete: true
Attachment #8626186 - Flags: review?(sphink)
Attachment #8626186 - Flags: review?(sphink) → review+
(Assignee)

Comment 26

4 years ago
There is definitely a perf regression, here are the results: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=94465605689e&newProject=try&newRevision=6410745db362

I'm not sure if these performance losses are acceptable.
Flags: needinfo?(ehsan)
(Reporter)

Comment 27

4 years ago
IIRC there is a way to run the Talos tests under the Gecko Profiler on try.  Any chance you can check with Avi how that can be done, and get a profile from before and after to see if something very hot shows up in these profiles?

Also, can you please do a no-op try push on top of 94465605689e to see if that moves any needles?  :-)
Flags: needinfo?(ehsan)
On Windows in particular, I think we should do Talos comparisons on PGO builds. The failure-paths in these macros should be extremely cold and PGO should produce better code than a plain MSVC build.
(Assignee)

Comment 29

4 years ago
I've pushed a profiler run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=f50c94bc5084

I'm not exactly sure what a PGO build is, but doing a Talos comparison on PGO builds should be doable.
Created attachment 8670009 [details]
most frequent ElementAt calls at startup and while idle

I wrote a little patch that logs the first few stack frames every time you call ElementAt(), then I wrote a script to analyze that to come up with the most frequent call sites. Don't take this too seriously, as it is all rather hacky, but it might be a good starting point. I was thinking we could add an UnsafeElementAt() method and call that for these hot spots. The top 7 in this log account for about 70% of the call sites. (Note that number 2 is debug-only: this was taken in a debug build, which is another reason to take this with a grain of salt.) I also need to try this on something besides startup.
(Assignee)

Comment 31

3 years ago
Once bug 1183355 lands, this patch could be rewritten to just use `if (MOZ_UNLIKELY(i > Length())) { MOZ_CRASH("nsTArray index out of bounds"); }`, which would probably simplify a lot of things.

We wouldn't get some handy information (such as the index which was attempted to be accessed and the length of the nsTArray), but it may be worth it.

Updated

3 years ago
Keywords: sec-want
(Assignee)

Comment 32

3 years ago
Created attachment 8671657 [details] [diff] [review]
Check nsTArray ElementAt indexing in release builds

This is a simplified version of the patch which takes advantage of the work done in bug 1211979.

It basically just s/MOZ_ASSERT/MOZ_RELEASE_ASSERT in ElementAt, and introduces UncheckedElementAt for performance-sensitive code.
Attachment #8626186 - Attachment is obsolete: true
Attachment #8671657 - Flags: review?(sphink)
Comment on attachment 8671657 [details] [diff] [review]
Check nsTArray ElementAt indexing in release builds

Review of attachment 8671657 [details] [diff] [review]:
-----------------------------------------------------------------

lgtm, but I'm not a peer here.

In glancing at the code, it *looked* like index_type is always size_t, but I wonder if it would be worth a static_assert to say that index_type is unsigned. That made me nervous at first glance. static_assert(index_type(-1) > 0) or something.
Attachment #8671657 - Flags: review?(sphink) → review?(nfroyd)
(In reply to Steve Fink [:sfink, :s:] from comment #33)
> static_assert(index_type(-1) > 0) or something.

Probably want static_assert(mozilla::IsUnsigned<index_type>::value) (from mozilla/TypeTraits.h)
Comment on attachment 8671657 [details] [diff] [review]
Check nsTArray ElementAt indexing in release builds

Review of attachment 8671657 [details] [diff] [review]:
-----------------------------------------------------------------

As before, I'm fine with this in principle; do we have actual performance numbers yet?  (I see a Talos push, but it looks...colorful, and I'm not sure what it actually tells me relative to a baseline (or even what the baseline is).)
Attachment #8671657 - Flags: review?(nfroyd) → review+
(Assignee)

Comment 36

3 years ago
(In reply to Nathan Froyd [:froydnj] from comment #35)
> Comment on attachment 8671657 [details] [diff] [review]
> Check nsTArray ElementAt indexing in release builds
> 
> Review of attachment 8671657 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> As before, I'm fine with this in principle; do we have actual performance
> numbers yet?  (I see a Talos push, but it looks...colorful, and I'm not sure
> what it actually tells me relative to a baseline (or even what the baseline
> is).)

Nope - we don't have perf numbers yet, which is why I'm not planning to land this until we get something like bug 1211736.

I just wanted to have a more up-to-date and simpler patch attached which could be used for perf testing.
(Assignee)

Comment 37

2 years ago
Now that I know a little more about what I am doing than I did when I originally worked on this, I went back and ran a quick Talos test with the most trivial change which I could think of (changing MOZ_ASSERT to MOZ_RELEASE_ASSERT in the indexing operations). The results are here: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=3bfeffee70871b623a62c1cb1cac0c53ef0157b2&newProject=try&newRevision=8a3c37200ba9a4b9ee14fad0e9e3a5fc2e9c38df&framework=1&showOnlyImportant=0

It definitely seems like a somewhat significant performance regression on some of the tests.
That it's only a11y tests seems quite odd to me; what's so special about a11y tests that bounds checking slows them down, but not something like page load tests?

If we're really confident that our Talos tests are a good measure of performance regressions, I think we should have someone investigate, or have an a11y person sign off on the small performance regression.  And then we should land this.
(Assignee)

Comment 39

2 years ago
Created attachment 8779823 [details] [diff] [review]
Add release mode bounds checking to nsTArray

This patch is different from the one it is replacing in that it doesn't add any UncheckedElementAt methods.
Those can be added if it is discovered they are needed for performance purposes.
Attachment #8779823 - Flags: review?(nfroyd)
(Assignee)

Updated

2 years ago
Attachment #8671657 - Attachment is obsolete: true
Comment on attachment 8779823 [details] [diff] [review]
Add release mode bounds checking to nsTArray

Review of attachment 8779823 [details] [diff] [review]:
-----------------------------------------------------------------

Let's try it; maybe we'll have to downgrade to MOZ_DIAGNOSTIC_ASSERT at some point if the crashes are just terrible.
Attachment #8779823 - Flags: review?(nfroyd) → review+
Are we abandoning the approach that provided more information in the messages?
Flags: needinfo?(michael)
(Assignee)

Comment 42

2 years ago
(In reply to Nathan Froyd [:froydnj] from comment #41)
> Are we abandoning the approach that provided more information in the
> messages?

I figured I would start with the simplest solution to the problem, which was to just change from a MOZ_ASSERT to MOZ_RELEASE_ASSERT. I don't imagine it would be too hard to provide more information as well, such as what the index was and what the capacity was. I'll attach another patch and also do a try run with it, just to check if adding that extra info degrades performance.
Flags: needinfo?(michael)
(Assignee)

Comment 43

2 years ago
Created attachment 8780240 [details] [diff] [review]
Add release mode bounds checking with custom annotations to nsTArray

This is my pass at implementing custom annotations for these nsTArray crashes. I just store the index and length in the MozCrashReason field, as the value which would have been stored there was very useless.
Attachment #8780240 - Flags: review?(nfroyd)
Attachment #8780240 - Flags: review?(nfroyd) → review+
(Assignee)

Comment 44

2 years ago
This is the talos run for the custom annotations code:

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=3bfeffee70871b623a62c1cb1cac0c53ef0157b2&newProject=try&newRevision=9e7f1b107d1231f2fffda60a176bc0e9cdf45684&framework=1&showOnlyImportant=0

froydnj: Is this OK to land as-is, or do we need to do some perf work before we can land it?
Flags: needinfo?(nfroyd)
The tp5o responsiveness opt e10s test on Windows looks pretty bad...I wonder what happened there?
Flags: needinfo?(nfroyd)
(Assignee)

Comment 46

2 years ago
(In reply to Nathan Froyd [:froydnj] from comment #45)
> The tp5o responsiveness opt e10s test on Windows looks pretty bad...I wonder
> what happened there?

I'm re-executing those tests 5 more times to get some more precision just to confirm that it's real.
(Assignee)

Comment 47

2 years ago
After re-triggering more times, that test failure seems to have disappeared.
Flags: needinfo?(nfroyd)
(In reply to Michael Layzell [:mystor] from comment #47)
> After re-triggering more times, that test failure seems to have disappeared.

Consistent performance tests are so great.  Let's land it!
Flags: needinfo?(nfroyd)

Comment 49

2 years ago
Pushed by michael@thelayzells.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/0c17b64aac66
Add release mode bounds checking with custom annotations to nsTArray, r=froydnj

Comment 50

2 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/0c17b64aac66
Status: NEW → RESOLVED
Last Resolved: 2 years ago
status-firefox51: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla51
Could you consider uplifting this to Aurora once it seems stable? It seems like it will fix a number of hidden security issues, so it would be nice to get it out a release earlier if possible.
Flags: needinfo?(michael)
(Assignee)

Updated

2 years ago
Attachment #8779823 - Attachment is obsolete: true
(Assignee)

Comment 52

2 years ago
Comment on attachment 8780240 [details] [diff] [review]
Add release mode bounds checking with custom annotations to nsTArray

I don't really see a good reason to not uplift if we're OK with causing some more crashes in aurora in exchange for finding & fixing these bugs faster.

Approval Request Comment
[Feature/regressing bug #]: No bug
[User impact if declined]: Potentially sec-critical bugs due to out of bounds array accesses will not be caught in aurora builds.
[Describe test coverage new/current, TreeHerder]: None - changes a MOZ_ASSERT to a release assert and annotates the crash report with useful information. nsTArray is used a lot in our test code.
[Risks and why]: This will likely increase crashes in aurora due to out of bounds array accesses crashing instead of being a sec bug. The code itself is fairly low risk, and has been working in nightly.
[String/UUID change made/needed]: None
Flags: needinfo?(michael)
Attachment #8780240 - Flags: approval-mozilla-aurora?
status-firefox50: --- → affected
Comment on attachment 8780240 [details] [diff] [review]
Add release mode bounds checking with custom annotations to nsTArray

It's a hard trade-off, crashiness vs. being more secure. I am ok to uplift this patch and see how much this negatively impacts aurora stability. If things are really bad stability wise, we may need to revisit this decision and perhaps back this one out and let it ride the 51 train.
Attachment #8780240 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

Comment 54

2 years ago
bugherderuplift
https://hg.mozilla.org/releases/mozilla-aurora/rev/20f871385189
status-firefox50: affected → fixed
Here's a search that shows all the crashes caused by failed bounds checks in the past 7 days:

https://crash-stats.mozilla.com/search/?moz_crash_reason=~ElementAt&product=Firefox&_sort=-date&_facets=signature&_facets=moz_crash_reason&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature

> 1 	InvalidArrayIndex_CRASH | mozilla::a11y::DocAccessibleChildBase::SetMsaaIds   Add term 	403 	84.13 % 	1304130
> 2 	InvalidArrayIndex_CRASH | mozilla::EventListenerManager::GetListenerInfo   Add term 	19 	3.97 % 	
> 3 	InvalidArrayIndex_CRASH | mozilla::ContentCache::TextRectArray::GetUnionRectAsFarAsPossible   Add term 	16 	3.34 % 	1291082
> 4 	InvalidArrayIndex_CRASH | mozilla::MediaEngineCameraVideoSource::GetCapability   Add term 	14 	2.92 % 	
> 5 	InvalidArrayIndex_CRASH | nsTArray_Impl<T>::ElementAt | mozilla::widget::IMMHandler::DispatchCompositionChangeEvent   Add term 	9 	1.88 % 	
> 6 	InvalidArrayIndex_CRASH | mozilla::dom::AudioNode::DisconnectFromOutputIfConnected   Add term 	4 	0.84 % 	
> 7 	InvalidArrayIndex_CRASH | mozilla::gfx::impl::VRDisplayOculus::GetNextRenderTarget   Add term 	3 	0.63 % 	1297105
> 8 	InvalidArrayIndex_CRASH | nsTArray_Impl<T>::ElementAt | mozilla::widget::IMMHandler::HandleComposition   Add term 	3 	0.63 % 	
> 9 	InvalidArrayIndex_CRASH | nsUrlClassifierPrefixSet::GetPrefixesNative   Add term 	3 	0.63 % 	
> 10 	xul.dll@0x164dd1 | xul.dll@0xfe2efc | xul.dll@0x1169f2a | xul.dll@0x116eb35 | xul.dll@0x11700b4 | xul.dll@0x117f142 | xul.dll@0x1189b94 | xul.dll@0x11904af | xul.dll@0x1194ffb | xul.dll@0x11408b | xul.dll@0x1194f0a | InternalCallWinProc   Add term 	2 	0.42 % 	

If you click on the "Moz crash reason facet" tab, you can see this:

> 1 	ElementAt(aIndex = 3, aLength = 3) 	109 	22.76 %
> 2 	ElementAt(aIndex = 2, aLength = 2) 	86 	17.95 %
> 3 	ElementAt(aIndex = 1, aLength = 1) 	77 	16.08 %
> 4 	ElementAt(aIndex = 4, aLength = 4) 	38 	7.93 %
> 5 	ElementAt(aIndex = 5, aLength = 5) 	36 	7.52 %
> 6 	ElementAt(aIndex = 6, aLength = 6) 	35 	7.31 %
> 7 	ElementAt(aIndex = 4294967295, aLength = 0) 	17 	3.55 %
> 8 	ElementAt(aIndex = 0, aLength = 0) 	8 	1.67 %
> 9 	ElementAt(aIndex = 27, aLength = 27) 	8 	1.67 %
> 10 	ElementAt(aIndex = 10, aLength = 10) 	7 	1.46 %

Probably the most interesting thing there is #7.
Bug 1304130, bug 1291082, and bug 1297105 were already filed for three of those top 10 crash signatures I mentioned.

For bug 1291082 we already had the crash on file, but the crash signature changed when bounds checking was introduced, which is good. (That bug covers the case where the index is 4294967295.)

I also filed bug 1304596, bug 1304597, bug 1304599, bug 1304601, and 1304602. That gives us bugs for 8 of the top 10.

Anyway, nice job implementing this! It's catching real bugs in the wild :)

Updated

2 years ago
Depends on: 1312136
Whiteboard: [adv-main50-]

Updated

2 years ago
See Also: → bug 1311672
You need to log in before you can comment on or make changes to this bug.