Closed Bug 1159244 Opened 10 years ago Closed 9 years ago

Investigate adding non-debug mode bounds checking to nsTArray

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla51

Tracking Flags:

Tracking

Status

firefox50

---

fixed

firefox51

---

fixed

People

(Reporter: ehsan.akhgari, Assigned: nika)

References

Details

(Keywords: sec-want, Whiteboard: [adv-main50-])

Attachments

(2 files, 10 obsolete files)

Part 1: Add non debug mode bounds checking to nsTArray 10 years ago Nika Layzell [:nika] (ni? for response) 1.87 KB, patch		Details \| Diff \| Splinter Review
Add non-debug mode bounds checking to nsTArray 10 years ago Nika Layzell [:nika] (ni? for response) 5.47 KB, patch		Details \| Diff \| Splinter Review
Add non-debug mode bounds checking to nsTArray 10 years ago Nika Layzell [:nika] (ni? for response) 5.87 KB, patch	ehsan.akhgari : feedback+	Details \| Diff \| Splinter Review
Add non-debug mode bounds checking to nsTArray 10 years ago Nika Layzell [:nika] (ni? for response) 5.82 KB, patch	away : feedback+	Details \| Diff \| Splinter Review
Add non-debug mode bounds checking to nsTArray 10 years ago Nika Layzell [:nika] (ni? for response) 7.14 KB, patch	away : feedback+	Details \| Diff \| Splinter Review
Add non-debug mode bounds checking to nsTArray 10 years ago Nika Layzell [:nika] (ni? for response) 7.12 KB, patch	froydnj : review+ ehsan.akhgari : feedback+	Details \| Diff \| Splinter Review
Add non-debug mode bounds checking to nsTArray 10 years ago Nika Layzell [:nika] (ni? for response) 7.34 KB, patch		Details \| Diff \| Splinter Review
Add non-debug mode bounds checking to nsTArray 10 years ago Nika Layzell [:nika] (ni? for response) 8.52 KB, patch	sfink : review+	Details \| Diff \| Splinter Review
most frequent ElementAt calls at startup and while idle 9 years ago Andrew McCreight [:mccr8] 2.14 KB, text/plain		Details
Check nsTArray ElementAt indexing in release builds 9 years ago Nika Layzell [:nika] (ni? for response) 2.46 KB, patch	froydnj : review+	Details \| Diff \| Splinter Review
Add release mode bounds checking to nsTArray 9 years ago Nika Layzell [:nika] (ni? for response) 1.54 KB, patch	froydnj : review+	Details \| Diff \| Splinter Review
Add release mode bounds checking with custom annotations to nsTArray 9 years ago Nika Layzell [:nika] (ni? for response) 4.08 KB, patch	froydnj : review+ ritu : approval-mozilla-aurora+	Details \| Diff \| Splinter Review

(no longer active)

Reporter

Description

•

10 years ago

Right now, if the caller passes a garbage index to nsTArray, it will happily read past the bounds of the internal buffer. It would be interesting to investigate the runtime overhead of adding non-debug mode bounds checking to see if that is something which we can turn on for our production builds. If we find that it is too expensive in a few places, we may want to add unsafe variants of some functions for perf-sensitive code.

Ben Kelly [:bkelly, not reviewing]

Comment 1

•

10 years ago

Isn't this what nsTArray::SafeElementAt() is for? Also, what do you propose doing in production builds if ElementAt() does go out-of-bounds? MOZ_CRASH?

Andrew McCreight [:mccr8]

Comment 2

•

10 years ago

(In reply to Ben Kelly [:bkelly] from comment #1) > Isn't this what nsTArray::SafeElementAt() is for? It would be better to be safe-by-default, and opt into unsafe indexing where it is necessary for performance. Some kind of UnsafeElementAt() would also make reviewers more cautious. > Also, what do you propose doing in production builds if ElementAt() does go > out-of-bounds? MOZ_CRASH? Yes, that's pretty standard.

(no longer active)

Reporter

Comment 3

•

10 years ago

(In reply to Andrew McCreight [:mccr8] from comment #2) > (In reply to Ben Kelly [:bkelly] from comment #1) > > Isn't this what nsTArray::SafeElementAt() is for? > It would be better to be safe-by-default, and opt into unsafe indexing where > it is necessary for performance. Some kind of UnsafeElementAt() would also > make reviewers more cautious. Yeah. Also note that there are other methods that access stuff within the array without any bounds checks as well. > > Also, what do you propose doing in production builds if ElementAt() does go > > out-of-bounds? MOZ_CRASH? > Yes, that's pretty standard. Yes, indeed. Those are bugs that we need to catch and fix -- we should not attempt to just recover from them by returning nullptr. The goal here is to make those unexploitable crashes.

(no longer active)

Reporter

Updated

•

10 years ago

Assignee: nobody → michael

Nika Layzell [:nika] (ni? for response)

Assignee

Comment 4

•

10 years ago

Attached patch Part 1: Add non debug mode bounds checking to nsTArray (obsolete) — Details — Splinter Review

I'm running talos tests on try to see if there is any significant (easily detectable) perf. difference: Before patch - https://treeherder.mozilla.org/#/jobs?repo=try&revision=223faa9f5319 After patch - https://treeherder.mozilla.org/#/jobs?repo=try&revision=5c0ce7024cd0

Biru [:poiru]

Comment 5

•

10 years ago

If you want to compare Talos pushes: - push a baseline to try with '-b o -p linux64 -u none -t none -awsy nstarrayboundschecks' - push bounds check patch to try with '-b o -p linux64 -u none -t none -awsy nstarrayboundschecks' - eventually view results at https://areweslimyet.com/?series=nstarrayboundschecks&evenspacing=1' (from bug 819090 comment 20)

(no longer active)

Reporter

Comment 6

•

10 years ago

(In reply to Birunthan Mohanathas [:poiru] from comment #5) > If you want to compare Talos pushes: > > - push a baseline to try with '-b o -p linux64 -u none -t none -awsy > nstarrayboundschecks' > - push bounds check patch to try with '-b o -p linux64 -u none -t none -awsy > nstarrayboundschecks' > - eventually view results at > https://areweslimyet.com/?series=nstarrayboundschecks&evenspacing=1' Isn't that only useful for memory usage concerns though? This change should not affect the memory usage, as far as I can tell.

Flags: needinfo?(birunthan)

Biru [:poiru]

Comment 7

•

10 years ago

(In reply to :Ehsan Akhgari (not reading bugmail, needinfo? me!) from comment #6) > Isn't that only useful for memory usage concerns though? This change should > not affect the memory usage, as far as I can tell. Gah, of course. Sorry, I will pay more attention in the future :)

Flags: needinfo?(birunthan)

Nika Layzell [:nika] (ni? for response)

Assignee

Comment 8

•

10 years ago

Attached patch Add non-debug mode bounds checking to nsTArray (obsolete) — Details — Splinter Review

New patch which annotates the crash reports caused by bounds checking on nsTArray.

Attachment #8605866 - Attachment is obsolete: true

Attachment #8616152 - Flags: feedback?(ehsan)

Nika Layzell [:nika] (ni? for response)

Assignee

Comment 9

•

10 years ago

Attached patch Add non-debug mode bounds checking to nsTArray (obsolete) — Details — Splinter Review

Attachment #8616152 - Attachment is obsolete: true

Attachment #8616152 - Flags: feedback?(ehsan)

Attachment #8616692 - Flags: feedback?(ehsan)

(no longer active)

Reporter

Comment 10

•

10 years ago

Comment on attachment 8616692 [details] [diff] [review] Add non-debug mode bounds checking to nsTArray Review of attachment 8616692 [details] [diff] [review]: ----------------------------------------------------------------- Looks great to me! ::: xpcom/glue/nsTArray.cpp @@ +12,5 @@ > > +// We can only use the crash reporter when we are linking directly with libxul > +// and the crash reporter is enabled > +#if defined(MOZ_CRASHREPORTER) && defined(MOZILLA_INTERNAL_API) && \ > + !defined(MOZILLA_EXTERNAL_LINKAGE) In order to not repeat this sequence below, you can #define a macro here, and make the inclusion of nsExceptionHandler, and the code block below depend on that instead.

Attachment #8616692 - Flags: feedback?(ehsan) → feedback+

Nika Layzell [:nika] (ni? for response)

Assignee

Comment 11

•

10 years ago

Attached patch Add non-debug mode bounds checking to nsTArray (obsolete) — Details — Splinter Review

Made #if-s less repetitive

Attachment #8616692 - Attachment is obsolete: true

Nika Layzell [:nika] (ni? for response)

Assignee

Updated

•

10 years ago

Attachment #8616719 - Flags: feedback?(dmajor)

(Away)

Comment 12

•

10 years ago

Comment on attachment 8616719 [details] [diff] [review] Add non-debug mode bounds checking to nsTArray Review of attachment 8616719 [details] [diff] [review]: ----------------------------------------------------------------- I think this is a good start. I'd recommend elaborating the messages to make it more clear on a crash report what's going on, e.g. "invalid array index" -> "Accessing invalid array index" "invalid aStart index" -> "Removing at invalid aStart index" etc. I suspect that once the crash reports start coming in, you'll find yourself hungry for more information. Why did this crash happen? Was aIndex too large by one? By ten? Did it underflow to 0xffffffff? Was it the poison value 0x5a5a5a5a? Usually in those situations, people set a needinfo flag for me to dig through minidumps and find those values, but that's a slow process. Perhaps we could automate this by doing a printf-style message, like maybe: NSTARRAY_BOUNDS_ASSERT(aIndex < Length(), "Accessing invalid index %u, length %u", aIndex, Length()); What do y'all think?

Attachment #8616719 - Flags: feedback?(dmajor) → feedback+

(no longer active)

Reporter

Comment 13

•

10 years ago

I think that's a great idea!

Nika Layzell [:nika] (ni? for response)

Assignee

Comment 14

•

10 years ago

Attached patch Add non-debug mode bounds checking to nsTArray (obsolete) — Details — Splinter Review

Updated error message to include the values which caused the bound check failure. Do you like the new error messages?

Attachment #8616719 - Attachment is obsolete: true

Attachment #8616860 - Flags: feedback?(dmajor)

(Away)

Comment 15

•

10 years ago

Comment on attachment 8616860 [details] [diff] [review] Add non-debug mode bounds checking to nsTArray Review of attachment 8616860 [details] [diff] [review]: ----------------------------------------------------------------- Nice! You probably don't need to mention "nsTArray boundcheck" since that's already in the crash annotation key.

Attachment #8616860 - Flags: feedback?(dmajor) → feedback+

Nika Layzell [:nika] (ni? for response)

Assignee

Comment 16

•

10 years ago

Attached patch Add non-debug mode bounds checking to nsTArray (obsolete) — Details — Splinter Review

I'm not sure if there is anything more I should do with the patch before we try to land it. try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=f59d5ce609cd

Attachment #8616860 - Attachment is obsolete: true

Attachment #8616941 - Flags: review?(ehsan)

(no longer active)

Reporter

Comment 17

•

10 years ago

Comment on attachment 8616941 [details] [diff] [review] Add non-debug mode bounds checking to nsTArray Review of attachment 8616941 [details] [diff] [review]: ----------------------------------------------------------------- This looks good, but Nathan should do the final review, I think. Do you have Talos measurements? Did this move any needles? ::: xpcom/glue/nsTArray.cpp @@ +40,5 @@ > + // We don't use MOZ_CRASH, because we don't have the error message as a string literal, > + // instead it has been passed in as a const char* > + MOZ_ReportCrash(aMessage, aFilename, aLine); > + MOZ_REALLY_CRASH(); > +} Nit: for hygiene purposes (because of unified builds), please #undef at the end of the file. ::: xpcom/glue/nsTArray.h @@ +18,5 @@ > #include "mozilla/ReverseIterator.h" > #include "mozilla/TypeTraits.h" > > #include <string.h> > +#include <inttypes.h> Please include "mozilla/IntegerPrintfMacros.h" instead (see the comments in that header for why.) @@ +99,5 @@ > + > +#define NSTARRAY_BOUNDS_ASSERT(expr, fmt_str, ...) \ > + if (MOZ_UNLIKELY(!(expr))) { \ > + char msg[1024]; \ > + snprintf(msg, sizeof(msg), fmt_str, __VA_ARGS__); \ Nit: whitespace

Attachment #8616941 - Flags: review?(nfroyd)

Attachment #8616941 - Flags: review?(ehsan)

Attachment #8616941 - Flags: feedback+

Nathan Froyd [:froydnj]

Comment 18

•

10 years ago

Comment on attachment 8616941 [details] [diff] [review] Add non-debug mode bounds checking to nsTArray Review of attachment 8616941 [details] [diff] [review]: ----------------------------------------------------------------- I don't have any problems with the patch per se, though you should probably redo your try push from Monday, since it looks like it got caught in infra problems, or something. Building on all platforms and testing on one is good practice for patches like these: -b do -p all -u all[x64] -t none will only test on x86-64 Linux. I will reiterate Ehsan's point about Talos measurements, or really, measurements of any kind, because I don't see any reviewing the history of the bug. Do you have any measurements on a possible performance hit from this change? r=me with Ehsan's comments and the below comments addressed, but landing this is conditional on performance measurements. ::: xpcom/glue/nsTArray.cpp @@ +7,5 @@ > #include <string.h> > #include "nsTArray.h" > #include "nsXPCOM.h" > #include "nsDebug.h" > #include "mozilla/CheckedInt.h" Please explicitly include "mozilla/Assertions.h" here. ::: xpcom/glue/nsTArray.h @@ +99,5 @@ > + > +#define NSTARRAY_BOUNDS_ASSERT(expr, fmt_str, ...) \ > + if (MOZ_UNLIKELY(!(expr))) { \ > + char msg[1024]; \ > + snprintf(msg, sizeof(msg), fmt_str, __VA_ARGS__); \ Please include "mozilla/Snprintf.h" and use snprintf_literal here.

Attachment #8616941 - Flags: review?(nfroyd) → review+

Nika Layzell [:nika] (ni? for response)

Assignee

Comment 19

•

10 years ago

Attached patch Add non-debug mode bounds checking to nsTArray (obsolete) — Details — Splinter Review

I'm not sure how this slipped under my radar, but I made these changes when you submitted your response and never pushed them >.>. I'll have talos measurements up on here by the end of the day.

Attachment #8616941 - Attachment is obsolete: true

Nika Layzell [:nika] (ni? for response)

Assignee

Comment 20

•

10 years ago

I appear to be causing some "hazard"s, which, as far as I can tell, are due to the increased complexity of the nsTArray indexing operations - it seems as though some static/runtime analysis tool has noticed that an unrooted javascript reference could be kept alive over the call to ArrayBoundsCheckFailed, which theoretically could trigger a GC? Here's the try log: http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/michael@thelayzells.com-6410745db362/try-linux64-br-haz/linux64-br-haz_try_dep-bm76-try1-build153.txt.gz, and the hazard log: https://ftp-ssl.mozilla.org/pub/mozilla.org/firefox/try-builds/michael@thelayzells.com-6410745db362/try-linux64-br-haz/hazards.txt.gz Any suggestions about how I could get around this (as I'm pretty sure that there is no way ArrayBoundsCheckFailed could trigger a GC, unless the crash reporter can trigger GCs, and if it did, then the program will crash immediately before any pointers could be dereferenced).

Flags: needinfo?(ehsan)

Andrew McCreight [:mccr8]

Comment 21

•

10 years ago

I'm guessing the hazard is because the analysis can't tell that ArrayBoundsCheckFailed will never return. I'm not sure if there's some annotation you can add to let the analysis know that. sfink might have some ideas.

Nika Layzell [:nika] (ni? for response)

Assignee

Comment 22

•

10 years ago

I could pull the MOZ_REALLY_CRASH out of the ArrayBoundsCheckFailed call and into the caller (immediately after the call to ArrayBoundsCheckFailed)? I'm not sure if that would make the analysis happy (as I'm not sure how it works). Marking ArrayBoundsCheckFailed as never returning would probably be better, I'll look at the MFBT annotations to see if I can find one.

Steve Fink [:sfink] [:s:]

Comment 23

•

10 years ago

(In reply to Michael Layzell [:mystor] from comment #22) > I could pull the MOZ_REALLY_CRASH out of the ArrayBoundsCheckFailed call and > into the caller (immediately after the call to ArrayBoundsCheckFailed)? I'm > not sure if that would make the analysis happy (as I'm not sure how it > works). I don't think it would. Even if I extended the analysis to understand noreturn functions, the code in the current patch would be fine. > Marking ArrayBoundsCheckFailed as never returning would probably be better, > I'll look at the MFBT annotations to see if I can find one. Sadly, it wouldn't help anyway. The analysis doesn't understand noreturn functions. Or least, I don't think it does -- it's possible that the information ends up getting used for the control flow graph that the analysis sees. That would be a fairly easy thing to add -- it would just be a matter of killing all variables' live ranges if you ever called a noreturn function -- but given that this is the first time I've seen it needed, I'm inclined to just annotate this away. It would probably be safe to claim that CrashReporter::AnnotateCrashReport never GCs, but I don't know that for 100% sure -- for all I know, there could be cases where we generate a crash report but don't actually crash. So for now, I'd go conservative and just lie and claim that ArrayBoundsCheckFailed can never GC. I have messy stuff in my working directory at the moment, but if you add it to ignoreFunctions in js/src/devtools/rootAnalysis/annotations.js, it'll be fine. The patch will look like: diff --git a/js/src/devtools/rootAnalysis/annotations.js b/js/src/devtools/rootAnalysis/annotations.js --- a/js/src/devtools/rootAnalysis/annotations.js +++ b/js/src/devtools/rootAnalysis/annotations.js @@ -184,16 +184,19 @@ var ignoreFunctions = { // Similar to heap snapshot mock classes, and GTests below. This posts a // synchronous runnable when a GTest fails, and we are pretty sure that the // particular runnable it posts can't even GC, but the analysis isn't // currently smart enough to determine that. In either case, this is (a) // only in GTests, and (b) only when the Gtest has already failed. We have // static and dynamic checks for no GC in the non-test code, and in the test // code we fall back to only the dynamic checks. "void test::RingbufferDumper::OnTestPartResult(testing::TestPartResult*)" : true, + + // This function will not return so any GCs it does are irrelevant. + "void ArrayBoundsCheckFailed(int8*, int8*, int32)": true, }; function isProtobuf(name) { return name.match(/\bgoogle::protobuf\b/) || name.match(/\bmozilla::devtools::protobuf\b/); } and I can review it. Maybe I'll look into querying attributes, but it'll require plumbing, and there's no need for you to be blocked on it.

(no longer active)

Reporter

Comment 24

•

10 years ago

I think Steve answered the question. :-) Clearing the ni.

Flags: needinfo?(ehsan)

Nika Layzell [:nika] (ni? for response)

Assignee

Comment 25

•

10 years ago

Attached patch Add non-debug mode bounds checking to nsTArray (obsolete) — Details — Splinter Review

Thanks. I decided to mark the function as MOZ_NORETURN anyways, because, well, it doesn't return. Here's a try for the hazards (hopefully they're gone now): https://treeherder.mozilla.org/#/jobs?repo=try&revision=de2a3b3da818 Also, hopefully I should have the performace measurements in a few hours (finally).

Attachment #8624832 - Attachment is obsolete: true

Attachment #8626186 - Flags: review?(sphink)

Steve Fink [:sfink] [:s:]

Updated

•

10 years ago

Attachment #8626186 - Flags: review?(sphink) → review+

Nika Layzell [:nika] (ni? for response)

Assignee

Comment 26

•

10 years ago

There is definitely a perf regression, here are the results: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=94465605689e&newProject=try&newRevision=6410745db362 I'm not sure if these performance losses are acceptable.

Flags: needinfo?(ehsan)

(no longer active)

Reporter

Comment 27

•

10 years ago

IIRC there is a way to run the Talos tests under the Gecko Profiler on try. Any chance you can check with Avi how that can be done, and get a profile from before and after to see if something very hot shows up in these profiles? Also, can you please do a no-op try push on top of 94465605689e to see if that moves any needles? :-)

Flags: needinfo?(ehsan)

(Away)

Comment 28

•

10 years ago

On Windows in particular, I think we should do Talos comparisons on PGO builds. The failure-paths in these macros should be extremely cold and PGO should produce better code than a plain MSVC build.

Nika Layzell [:nika] (ni? for response)

Assignee

Comment 29

•

10 years ago

I've pushed a profiler run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=f50c94bc5084 I'm not exactly sure what a PGO build is, but doing a Talos comparison on PGO builds should be doable.

Andrew McCreight [:mccr8]

Comment 30

•

9 years ago

Attached file most frequent ElementAt calls at startup and while idle — Details

I wrote a little patch that logs the first few stack frames every time you call ElementAt(), then I wrote a script to analyze that to come up with the most frequent call sites. Don't take this too seriously, as it is all rather hacky, but it might be a good starting point. I was thinking we could add an UnsafeElementAt() method and call that for these hot spots. The top 7 in this log account for about 70% of the call sites. (Note that number 2 is debug-only: this was taken in a debug build, which is another reason to take this with a grain of salt.) I also need to try this on something besides startup.

Andrew McCreight [:mccr8]

Updated

•

9 years ago

Depends on: 1211736

Nika Layzell [:nika] (ni? for response)

Assignee

Comment 31

•

9 years ago

Once bug 1183355 lands, this patch could be rewritten to just use `if (MOZ_UNLIKELY(i > Length())) { MOZ_CRASH("nsTArray index out of bounds"); }`, which would probably simplify a lot of things. We wouldn't get some handy information (such as the index which was attempted to be accessed and the length of the nsTArray), but it may be worth it.

Andrew McCreight [:mccr8]

Updated

•

9 years ago

Depends on: 1183355

Jesse Ruderman

Updated

•

9 years ago

Keywords: sec-want

Nika Layzell [:nika] (ni? for response)

Assignee

Comment 32

•

9 years ago

Attached patch Check nsTArray ElementAt indexing in release builds (obsolete) — Details — Splinter Review

This is a simplified version of the patch which takes advantage of the work done in bug 1211979. It basically just s/MOZ_ASSERT/MOZ_RELEASE_ASSERT in ElementAt, and introduces UncheckedElementAt for performance-sensitive code.

Attachment #8626186 - Attachment is obsolete: true

Attachment #8671657 - Flags: review?(sphink)

Steve Fink [:sfink] [:s:]

Comment 33

•

9 years ago

Comment on attachment 8671657 [details] [diff] [review] Check nsTArray ElementAt indexing in release builds Review of attachment 8671657 [details] [diff] [review]: ----------------------------------------------------------------- lgtm, but I'm not a peer here. In glancing at the code, it *looked* like index_type is always size_t, but I wonder if it would be worth a static_assert to say that index_type is unsigned. That made me nervous at first glance. static_assert(index_type(-1) > 0) or something.

Attachment #8671657 - Flags: review?(sphink) → review?(nfroyd)

Emanuel Hoogeveen [:ehoogeveen]

Comment 34

•

9 years ago

(In reply to Steve Fink [:sfink, :s:] from comment #33) > static_assert(index_type(-1) > 0) or something. Probably want static_assert(mozilla::IsUnsigned<index_type>::value) (from mozilla/TypeTraits.h)

Nathan Froyd [:froydnj]

Comment 35

•

9 years ago

Comment on attachment 8671657 [details] [diff] [review] Check nsTArray ElementAt indexing in release builds Review of attachment 8671657 [details] [diff] [review]: ----------------------------------------------------------------- As before, I'm fine with this in principle; do we have actual performance numbers yet? (I see a Talos push, but it looks...colorful, and I'm not sure what it actually tells me relative to a baseline (or even what the baseline is).)

Attachment #8671657 - Flags: review?(nfroyd) → review+

Nika Layzell [:nika] (ni? for response)

Assignee

Comment 36

•

9 years ago

(In reply to Nathan Froyd [:froydnj] from comment #35) > Comment on attachment 8671657 [details] [diff] [review] > Check nsTArray ElementAt indexing in release builds > > Review of attachment 8671657 [details] [diff] [review]: > ----------------------------------------------------------------- > > As before, I'm fine with this in principle; do we have actual performance > numbers yet? (I see a Talos push, but it looks...colorful, and I'm not sure > what it actually tells me relative to a baseline (or even what the baseline > is).) Nope - we don't have perf numbers yet, which is why I'm not planning to land this until we get something like bug 1211736. I just wanted to have a more up-to-date and simpler patch attached which could be used for perf testing.

Nika Layzell [:nika] (ni? for response)

Assignee

Comment 37

•

9 years ago

Now that I know a little more about what I am doing than I did when I originally worked on this, I went back and ran a quick Talos test with the most trivial change which I could think of (changing MOZ_ASSERT to MOZ_RELEASE_ASSERT in the indexing operations). The results are here: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=3bfeffee70871b623a62c1cb1cac0c53ef0157b2&newProject=try&newRevision=8a3c37200ba9a4b9ee14fad0e9e3a5fc2e9c38df&framework=1&showOnlyImportant=0 It definitely seems like a somewhat significant performance regression on some of the tests.

Nathan Froyd [:froydnj]

Comment 38

•

9 years ago

That it's only a11y tests seems quite odd to me; what's so special about a11y tests that bounds checking slows them down, but not something like page load tests? If we're really confident that our Talos tests are a good measure of performance regressions, I think we should have someone investigate, or have an a11y person sign off on the small performance regression. And then we should land this.

Nika Layzell [:nika] (ni? for response)

Assignee

Comment 39

•

9 years ago

Attached patch Add release mode bounds checking to nsTArray (obsolete) — Details — Splinter Review

This patch is different from the one it is replacing in that it doesn't add any UncheckedElementAt methods. Those can be added if it is discovered they are needed for performance purposes.

Attachment #8779823 - Flags: review?(nfroyd)

Nika Layzell [:nika] (ni? for response)

Assignee

Updated

•

9 years ago

Attachment #8671657 - Attachment is obsolete: true

Nathan Froyd [:froydnj]

Comment 40

•

9 years ago

Comment on attachment 8779823 [details] [diff] [review] Add release mode bounds checking to nsTArray Review of attachment 8779823 [details] [diff] [review]: ----------------------------------------------------------------- Let's try it; maybe we'll have to downgrade to MOZ_DIAGNOSTIC_ASSERT at some point if the crashes are just terrible.

Attachment #8779823 - Flags: review?(nfroyd) → review+

Nathan Froyd [:froydnj]

Comment 41

•

9 years ago

Are we abandoning the approach that provided more information in the messages?

Flags: needinfo?(michael)

Nika Layzell [:nika] (ni? for response)

Assignee

Comment 42

•

9 years ago

(In reply to Nathan Froyd [:froydnj] from comment #41) > Are we abandoning the approach that provided more information in the > messages? I figured I would start with the simplest solution to the problem, which was to just change from a MOZ_ASSERT to MOZ_RELEASE_ASSERT. I don't imagine it would be too hard to provide more information as well, such as what the index was and what the capacity was. I'll attach another patch and also do a try run with it, just to check if adding that extra info degrades performance.

Flags: needinfo?(michael)

Nika Layzell [:nika] (ni? for response)

Assignee

Comment 43

•

9 years ago

Attached patch Add release mode bounds checking with custom annotations to nsTArray — Details — Splinter Review

This is my pass at implementing custom annotations for these nsTArray crashes. I just store the index and length in the MozCrashReason field, as the value which would have been stored there was very useless.

Attachment #8780240 - Flags: review?(nfroyd)

Nathan Froyd [:froydnj]

Updated

•

9 years ago

Attachment #8780240 - Flags: review?(nfroyd) → review+

Nika Layzell [:nika] (ni? for response)

Assignee

Comment 44

•

9 years ago

This is the talos run for the custom annotations code: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=3bfeffee70871b623a62c1cb1cac0c53ef0157b2&newProject=try&newRevision=9e7f1b107d1231f2fffda60a176bc0e9cdf45684&framework=1&showOnlyImportant=0 froydnj: Is this OK to land as-is, or do we need to do some perf work before we can land it?

Flags: needinfo?(nfroyd)

Nathan Froyd [:froydnj]

Comment 45

•

9 years ago

The tp5o responsiveness opt e10s test on Windows looks pretty bad...I wonder what happened there?

Flags: needinfo?(nfroyd)

Nika Layzell [:nika] (ni? for response)

Assignee

Comment 46

•

9 years ago

(In reply to Nathan Froyd [:froydnj] from comment #45) > The tp5o responsiveness opt e10s test on Windows looks pretty bad...I wonder > what happened there? I'm re-executing those tests 5 more times to get some more precision just to confirm that it's real.

Nika Layzell [:nika] (ni? for response)

Assignee

Comment 47

•

9 years ago

After re-triggering more times, that test failure seems to have disappeared.

Flags: needinfo?(nfroyd)

Nathan Froyd [:froydnj]

Comment 48

•

9 years ago

(In reply to Michael Layzell [:mystor] from comment #47) > After re-triggering more times, that test failure seems to have disappeared. Consistent performance tests are so great. Let's land it!

Flags: needinfo?(nfroyd)

Pulsebot

Comment 49

•

9 years ago

Pushed by michael@thelayzells.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/0c17b64aac66 Add release mode bounds checking with custom annotations to nsTArray, r=froydnj

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 50

•

9 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/0c17b64aac66

Status: NEW → RESOLVED

Closed: 9 years ago

status-firefox51: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla51

Andrew McCreight [:mccr8]

Comment 51

•

9 years ago

Could you consider uplifting this to Aurora once it seems stable? It seems like it will fix a number of hidden security issues, so it would be nice to get it out a release earlier if possible.

Flags: needinfo?(michael)

Nika Layzell [:nika] (ni? for response)

Assignee

Updated

•

9 years ago

Attachment #8779823 - Attachment is obsolete: true

Nika Layzell [:nika] (ni? for response)

Assignee

Comment 52

•

9 years ago

Comment on attachment 8780240 [details] [diff] [review] Add release mode bounds checking with custom annotations to nsTArray I don't really see a good reason to not uplift if we're OK with causing some more crashes in aurora in exchange for finding & fixing these bugs faster. Approval Request Comment [Feature/regressing bug #]: No bug [User impact if declined]: Potentially sec-critical bugs due to out of bounds array accesses will not be caught in aurora builds. [Describe test coverage new/current, TreeHerder]: None - changes a MOZ_ASSERT to a release assert and annotates the crash report with useful information. nsTArray is used a lot in our test code. [Risks and why]: This will likely increase crashes in aurora due to out of bounds array accesses crashing instead of being a sec bug. The code itself is fairly low risk, and has been working in nightly. [String/UUID change made/needed]: None

Flags: needinfo?(michael)

Attachment #8780240 - Flags: approval-mozilla-aurora?

Ritu Kothari (:ritu) (Inactive, please n-i to RyanVM, jcristau, or pascal)

Updated

•

9 years ago

status-firefox50: --- → affected

Ritu Kothari (:ritu) (Inactive, please n-i to RyanVM, jcristau, or pascal)

Comment 53

•

9 years ago

Comment on attachment 8780240 [details] [diff] [review] Add release mode bounds checking with custom annotations to nsTArray It's a hard trade-off, crashiness vs. being more secure. I am ok to uplift this patch and see how much this negatively impacts aurora stability. If things are really bad stability wise, we may need to revisit this decision and perhaps back this one out and let it ride the 51 train.

Attachment #8780240 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 54

•

9 years ago

bugherder uplift

https://hg.mozilla.org/releases/mozilla-aurora/rev/20f871385189

status-firefox50: affected → fixed

Nicholas Nethercote [inactive]

Comment 55

•

8 years ago

Here's a search that shows all the crashes caused by failed bounds checks in the past 7 days: https://crash-stats.mozilla.com/search/?moz_crash_reason=~ElementAt&product=Firefox&_sort=-date&_facets=signature&_facets=moz_crash_reason&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature > 1 InvalidArrayIndex_CRASH | mozilla::a11y::DocAccessibleChildBase::SetMsaaIds Add term 403 84.13 % 1304130 > 2 InvalidArrayIndex_CRASH | mozilla::EventListenerManager::GetListenerInfo Add term 19 3.97 % > 3 InvalidArrayIndex_CRASH | mozilla::ContentCache::TextRectArray::GetUnionRectAsFarAsPossible Add term 16 3.34 % 1291082 > 4 InvalidArrayIndex_CRASH | mozilla::MediaEngineCameraVideoSource::GetCapability Add term 14 2.92 % > 5 InvalidArrayIndex_CRASH | nsTArray_Impl<T>::ElementAt | mozilla::widget::IMMHandler::DispatchCompositionChangeEvent Add term 9 1.88 % > 6 InvalidArrayIndex_CRASH | mozilla::dom::AudioNode::DisconnectFromOutputIfConnected Add term 4 0.84 % > 7 InvalidArrayIndex_CRASH | mozilla::gfx::impl::VRDisplayOculus::GetNextRenderTarget Add term 3 0.63 % 1297105 > 8 InvalidArrayIndex_CRASH | nsTArray_Impl<T>::ElementAt | mozilla::widget::IMMHandler::HandleComposition Add term 3 0.63 % > 9 InvalidArrayIndex_CRASH | nsUrlClassifierPrefixSet::GetPrefixesNative Add term 3 0.63 % > 10 xul.dll@0x164dd1 | xul.dll@0xfe2efc | xul.dll@0x1169f2a | xul.dll@0x116eb35 | xul.dll@0x11700b4 | xul.dll@0x117f142 | xul.dll@0x1189b94 | xul.dll@0x11904af | xul.dll@0x1194ffb | xul.dll@0x11408b | xul.dll@0x1194f0a | InternalCallWinProc Add term 2 0.42 % If you click on the "Moz crash reason facet" tab, you can see this: > 1 ElementAt(aIndex = 3, aLength = 3) 109 22.76 % > 2 ElementAt(aIndex = 2, aLength = 2) 86 17.95 % > 3 ElementAt(aIndex = 1, aLength = 1) 77 16.08 % > 4 ElementAt(aIndex = 4, aLength = 4) 38 7.93 % > 5 ElementAt(aIndex = 5, aLength = 5) 36 7.52 % > 6 ElementAt(aIndex = 6, aLength = 6) 35 7.31 % > 7 ElementAt(aIndex = 4294967295, aLength = 0) 17 3.55 % > 8 ElementAt(aIndex = 0, aLength = 0) 8 1.67 % > 9 ElementAt(aIndex = 27, aLength = 27) 8 1.67 % > 10 ElementAt(aIndex = 10, aLength = 10) 7 1.46 % Probably the most interesting thing there is #7.

Nicholas Nethercote [inactive]

Comment 56

•

8 years ago

Bug 1304130, bug 1291082, and bug 1297105 were already filed for three of those top 10 crash signatures I mentioned. For bug 1291082 we already had the crash on file, but the crash signature changed when bounds checking was introduced, which is good. (That bug covers the case where the index is 4294967295.) I also filed bug 1304596, bug 1304597, bug 1304599, bug 1304601, and 1304602. That gives us bugs for 8 of the top 10. Anyway, nice job implementing this! It's catching real bugs in the wild :)

Kent James (:rkent)

Updated

•

8 years ago

Depends on: 1312136

Al Billings [:abillings - ex-MoCo]

Updated

•

8 years ago

Whiteboard: [adv-main50-]

Wayne Mery (:wsmwk)

Updated

•

8 years ago

Updated

•

5 years ago