Closed Bug 1234280 Opened 8 years ago Closed 8 years ago

Assertion failure: aIndex < mLength, at ../../dist/include/mozilla/Vector.h:451 with OOM

Categories

(Core :: JavaScript Engine, defect)

x86
Linux
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla46
Tracking Status
firefox42 --- wontfix
firefox43 --- wontfix
firefox44 + verified
firefox45 + verified
firefox46 + verified
firefox-esr38 44+ fixed
firefox-esr45 --- fixed
b2g-v2.0 --- wontfix
b2g-v2.0M --- wontfix
b2g-v2.1 --- wontfix
b2g-v2.1S --- wontfix
b2g-v2.2 --- affected
b2g-v2.5 --- fixed
b2g-v2.2r --- affected
b2g-master --- fixed

People

(Reporter: decoder, Assigned: bbouvier)

References

(Blocks 1 open bug)

Details

(4 keywords, Whiteboard: [jsbugmon:update][adv-main44+][adv-esr38.6+])

Attachments

(3 files, 1 obsolete file)

The following testcase crashes on mozilla-central revision 388bdc46ba51 (build with --enable-optimize --enable-posix-nspr-emulation --enable-valgrind --enable-gczeal --target=i686-pc-linux-gnu --disable-tests --enable-debug, run with --fuzzing-safe --thread-count=2 --ion-eager):

oomTest(() => unescape({
    thisprops: gc() && -new(function(x) {}).TestCase
}));



Backtrace:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xf76cfb40 (LWP 51416)]
0x08466a1d in operator[] (aIndex=272, this=0xf577fc90) at ../../dist/include/mozilla/Vector.h:451
#0  0x08466a1d in operator[] (aIndex=272, this=0xf577fc90) at ../../dist/include/mozilla/Vector.h:451
#1  js::jit::CodeGeneratorShared::addCacheLocations (this=this@entry=0xf577f000, locs=..., numLocs=numLocs@entry=0xf76cefc0) at js/src/jit/shared/CodeGenerator-shared.cpp:1629
#2  0x082bc48f in js::jit::CodeGenerator::visitGetPropertyIC (this=0xf577f000, ool=ool@entry=0xf5779720, ic=...) at js/src/jit/CodeGenerator.cpp:8693
#3  0x082d4b18 in js::jit::OutOfLineUpdateCache::visitGetPropertyIC (this=0xf5779720, codegen=0xf577f000) at js/src/jit/CodeGenerator.cpp:105
#4  0x08367097 in js::jit::GetPropertyIC::accept (this=0xf7a6faa8, codegen=0xf577f000, visitor=0xf5779734) at js/src/jit/IonCaches.h:431
#5  0x0829504f in js::jit::CodeGenerator::visitOutOfLineCache (this=0xf577f000, ool=0xf5779720) at js/src/jit/CodeGenerator.cpp:146
#6  0x082c2988 in js::jit::OutOfLineCodeBase<js::jit::CodeGenerator>::generate (this=0xf5779720, codegen=0xf577f000) at js/src/jit/shared/CodeGenerator-shared.h:580
#7  0x08467b7c in js::jit::CodeGeneratorShared::generateOutOfLineCode (this=this@entry=0xf577f000) at js/src/jit/shared/CodeGenerator-shared.cpp:182
#8  0x0848dbcf in js::jit::CodeGeneratorX86Shared::generateOutOfLineCode (this=this@entry=0xf577f000) at js/src/jit/x86-shared/CodeGenerator-x86-shared.cpp:409
#9  0x082c2857 in js::jit::CodeGenerator::generate (this=this@entry=0xf577f000) at js/src/jit/CodeGenerator.cpp:8021
#10 0x082dcd3c in js::jit::GenerateCode (mir=mir@entry=0xf7aba100, lir=0xf7abd400) at js/src/jit/Ion.cpp:1974
#11 0x0832647f in js::jit::CompileBackEnd (mir=mir@entry=0xf7aba100) at js/src/jit/Ion.cpp:1996
#12 0x086942f8 in js::HelperThread::handleIonWorkload (this=this@entry=0xf7a40800) at js/src/vm/HelperThreads.cpp:1263
#13 0x08695d28 in js::HelperThread::threadLoop (this=0xf7a40800) at js/src/vm/HelperThreads.cpp:1581
#14 0x08715051 in nspr::Thread::ThreadRoutine (arg=0xf7a02180) at js/src/vm/PosixNSPR.cpp:45
#15 0xf7fb0f70 in start_thread (arg=0xf76cfb40) at pthread_create.c:312
#16 0xf7d7a4ce in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:129
eax	0x0	0
ebx	0x9803430	159396912
ecx	0xf7e3b88c	-136071028
edx	0x0	0
esi	0xf577f000	-176689152
edi	0xf7abba8c	-139740532
ebp	0xf76cef68	4151111528
esp	0xf76cef00	4151111424
eip	0x8466a1d <js::jit::CodeGeneratorShared::addCacheLocations(js::jit::CacheLocationList const&, unsigned int*)+701>
=> 0x8466a1d <js::jit::CodeGeneratorShared::addCacheLocations(js::jit::CacheLocationList const&, unsigned int*)+701>:	movl   $0x1c3,0x0
   0x8466a27 <js::jit::CodeGeneratorShared::addCacheLocations(js::jit::CacheLocationList const&, unsigned int*)+711>:	call   0x80fffc0 <abort()>


Marking this one s-s because the assertion indicates a length violation.
Worst case is sec-high if this can be exploited and the out-of-bounds allows you to add some interesting cache stuff.
Keywords: sec-high
Whiteboard: [jsbugmon:update,bisect] → [jsbugmon:update]
JSBugMon: Bisection requested, result:
autoBisect shows this is probably related to the following changeset:

The first bad revision is:
changeset:   https://hg.mozilla.org/mozilla-central/rev/e1ce5b4fa814
user:        Brian Grinstead
date:        Thu Dec 17 17:00:28 2015 -0800
summary:     Bug 1233127 - Use createProcessingIntruction instead of addon-sdk to load stylesheets in tools;r=pbrosset

This iteration took 393.874 seconds to run.
Bisection result is bogus, retriggering.
Whiteboard: [jsbugmon:update] → [jsbugmon:update,bisect]
Attached patch oom.patch (obsolete) — Splinter Review
allocateData's callers should check for oom, this wasn't the case before this patch.
Looking at the code around, I inlined allocateData(IonCache&, size_t) into its unique caller, using type traits templating instead of the type system checking.

Also, as visitGetPropertyIC calls addCacheLocations which now returns a sentinel value in case of OOM, we need to return eagerly if the sentinel value is caught; otherwise we might set the location index as SIZE_MAX, and then when reading the location index in GetPropertyIC::allowArrayLength, we might hit another assert in IonScript::getCacheLocs. Duh, I'm starting to feel that oom flags suck and using outparams + returning bool would actually be cleaner (we could even use MOZ_WARN_UNUSED_RESULT)...
Attachment #8701748 - Flags: review?(jdemooij)
Whiteboard: [jsbugmon:update,bisect] → [jsbugmon:update]
JSBugMon: Bisection requested, result:
autoBisect shows this is probably related to the following changeset:

The first bad revision is:
changeset:   https://hg.mozilla.org/mozilla-central/rev/6a3714198c65
user:        Gabriel Luong
date:        Wed Dec 09 11:30:00 2015 -0800
summary:     Bug 1231362 - Part 7: Update CodeMirror README for version 5.9.0 r=bgrins

This iteration took 345.854 seconds to run.
The bisection ranges I'd say are probably unreliable.
Comment on attachment 8701748 [details] [diff] [review]
oom.patch

Review of attachment 8701748 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry for the delay.

::: js/src/jit/CodeGenerator.cpp
@@ +8693,5 @@
>          size_t numLocs;
>          CacheLocationList& cacheLocs = lir->mirRaw()->toGetPropertyCache()->location();
>          size_t locationBase = addCacheLocations(cacheLocs, &numLocs);
> +        if (locationBase == SIZE_MAX)
> +            return;

Yeah, I agree with the last part of comment 4, it'd probably be more obviously correct to use a bool return value + size_t outparam instead of SIZE_MAX.
Attachment #8701748 - Flags: review?(jdemooij)
Attached patch fix.patchSplinter Review
And with MOZ_WARN_UNUSED_RESULT as a bonus for future callers.
Assignee: nobody → bbouvier+bugzilla
Attachment #8701748 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #8703127 - Flags: review?(jdemooij)
Comment on attachment 8703127 [details] [diff] [review]
fix.patch

Review of attachment 8703127 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks!
Attachment #8703127 - Flags: review?(jdemooij) → review+
Comment on attachment 8703127 [details] [diff] [review]
fix.patch

[Security approval request comment]
How easily could an exploit be constructed based on the patch?
It would be quite difficult: one would have to trigger an out-of-memory condition at a very particular location and detect at which particular offset in the vector it triggered.

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?
No, I wouldn't say so (the test isn't included in the patch).

Which older supported branches are affected by this flaw?
No idea (bisection ranges are unreliable). The affected function, addCacheLocations, seems to date from bug 905999, in which case probably all branches would be affected.

If not all supported branches, which bug introduced the flaw?
bug 905999.

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?
I think the same patch could apply on branches, if not it would not be hard at all to make backports. It shouldn't be more risky than landing on inbound.

How likely is this patch to cause regressions; how much testing does it need?
No regressions expected. Local testing passes all tests. Landing on inbound with or without test case should be good enough (note that current patch doesn't contain the test case).
Attachment #8703127 - Flags: sec-approval?
sec-approval+.

Please create and nominate patches for ESR38, Aurora, and Beta so we can check this in on those branches after this clears trunk.
Attachment #8703127 - Flags: sec-approval? → sec-approval+
Thanks. Pushed to inbound:
https://hg.mozilla.org/integration/mozilla-inbound/rev/fcc73febb20c

ni?me for the uplifts.
Flags: needinfo?(bbouvier)
Status: RESOLVED → VERIFIED
JSBugMon: This bug has been automatically verified fixed.
Comment on attachment 8703127 [details] [diff] [review]
fix.patch

Approval Request Comment
[Feature/regressing bug #]: bug 905999
[User impact if declined]: security issue (although really hard to trigger), see comment 10
[Describe test coverage new/current, TreeHerder]: green on treeherder for a couple of days
[Risks and why]: very low 
[String/UUID change made/needed]: n/a
Flags: needinfo?(bbouvier)
Attachment #8703127 - Flags: approval-mozilla-esr38?
Attachment #8703127 - Flags: approval-mozilla-beta?
Attachment #8703127 - Flags: approval-mozilla-aurora?
Comment on attachment 8703127 [details] [diff] [review]
fix.patch

sec-high, taking it on all branches.
Attachment #8703127 - Flags: approval-mozilla-esr38?
Attachment #8703127 - Flags: approval-mozilla-esr38+
Attachment #8703127 - Flags: approval-mozilla-beta?
Attachment #8703127 - Flags: approval-mozilla-beta+
Attachment #8703127 - Flags: approval-mozilla-aurora?
Attachment #8703127 - Flags: approval-mozilla-aurora+
JSBugMon: This bug has been automatically verified fixed on Fx45
Flags: in-testsuite? → in-testsuite-
failed to apply to esr38:

Tomcats-MacBook-Pro-2:mozilla-central Tomcat$ hg graft --edit -r 486e6901c1d9
grafting 322912:486e6901c1d9 "Bug 1234280 - Handle oom in CodeGeneratorShared::allocateData. r=jandem, a=sledru"
merging js/src/jit/CodeGenerator.cpp
merging js/src/jit/shared/CodeGenerator-shared.cpp
merging js/src/jit/shared/CodeGenerator-shared.h
warning: conflicts while merging js/src/jit/shared/CodeGenerator-shared.h! (edit, then use 'hg resolve --mark')
abort: unresolved conflicts, can't continue
(use hg resolve and hg graft --continue)

could you take a look, thanks!
Flags: needinfo?(bbouvier)
Yes, made a branch specific patch and pushed.

https://hg.mozilla.org/releases/mozilla-esr38/rev/0f7224441f20
Flags: needinfo?(bbouvier)
Group: javascript-core-security → core-security-release
Actually, bug 1233925 also landed in that window, but this patch is about generators and the signature has "generate" in it, so it's my primary suspect.
Benjamin, Jan (for patch from bug 1233925): Would you be able to look into the JIT crash spike KaiRo mentioned in comment 24? This is a high-priority ask as Fx44 enters RC on Monday.
Flags: needinfo?(jdemooij)
Flags: needinfo?(bbouvier)
Can't tell much what's going on, whether it's related to this patch or not. Unfortunately, the stack doesn't give us valuable information:

- The patch in this bug changes things under CodeGenerator::generate, which happens in an independent phase *after* the LIRGenerator::generate call (that is, LIRGenerator::generate doesn't call into CodeGenerator::generate). If there was an issue with this patch, I would hope one stack frame at the bottom would contain CodeGenerate::generate or something calling in or calling it.
- The patch in bug 1233925 changes things in GC marking, which can happen at all times.

All crash reports come from Windows NT, but that might be a coincidence, if the beta population is biased towards using Windows.

Interestingly, all the crashing addresses have the format 0xXXXX1cef (e.g. 0xdc7e1cef).

I don't have any more ideas, but I'll keep on investigating tomorrow.
(In reply to Ritu Kothari (:ritu) from comment #26)
> Benjamin, Jan (for patch from bug 1233925): Would you be able to look into
> the JIT crash spike KaiRo mentioned in comment 24? This is a high-priority
> ask as Fx44 enters RC on Monday.

Both patches are pretty simple and touch unrelated code. Both patches landed on Nightly and Aurora too, where we're not seeing those crashes. Because it's Windows-only, I wonder if this is a compiler/PGO bug.

The stack suggests a LIR instruction with a bogus vtable, so we get EXEC crashes.

I'll do some more digging.
Benjamin, maybe we can do a backout of this patch on beta and land the most minimal fix possible (crash on OOM or something).

Bug 1233925 is a sec-crit and unfortunately I don't see a simpler way to fix that one.
(In reply to Jan de Mooij [:jandem] from comment #29)
> Benjamin, maybe we can do a backout of this patch on beta and land the most
> minimal fix possible (crash on OOM or something).
> 
> Bug 1233925 is a sec-crit and unfortunately I don't see a simpler way to fix
> that one.

I'm working on a simpler beta patch, right now. The patch is trivial, but making sure all tests are passing with all flags before pushing.
Approval Request Comment
[Feature/regressing bug #]: this bug, just on beta
[User impact if declined]: random crashes for windows users (comment 44), probably due to a compiler error. Possible culprits are the patch this one backs out and the one in bug 1227287. We hope that the patch here is at fault and that the crashes will disappear with this simpler fix.
[Describe test coverage new/current, TreeHerder]: local testing all green
[Risks and why]: very low/unknown. This is a dumb fix to the initial fix of this bug. However, *if* it is a compiler error, it's possible that the crashes (comment 44) might stay.
[String/UUID change made/needed]: n/a

This backs out the beta patch, and instead uses a very simple fix. Sorry I've folded the two patches together, but really the simpler fix is only 6 lines long (the diff is posted below). Asked Sylvestre on irc, he told me I should reask for review, so doing it here.

Here's the actual 6 lines long patch:

diff --git a/jit/CodeGenerator.cpp b/jit/CodeGenerator.cpp
--- a/jit/CodeGenerator.cpp
+++ b/jit/CodeGenerator.cpp
@@ -8535,16 +8535,18 @@ void
 CodeGenerator::visitGetPropertyIC(OutOfLineUpdateCache* ool, DataPtr<GetPropertyIC>& ic)
 {
     LInstruction* lir = ool->lir();
 
     if (ic->idempotent()) {
         size_t numLocs;
         CacheLocationList& cacheLocs = lir->mirRaw()->toGetPropertyCache()->location();
         size_t locationBase = addCacheLocations(cacheLocs, &numLocs);
+        if (locationBase == SIZE_MAX)
+            return;
         ic->setLocationInfo(locationBase, numLocs);
     }
 
     saveLive(lir);
 
     pushArg(ic->id());
     pushArg(ic->object());
     pushArg(Imm32(ool->getCacheIndex()));
diff --git a/jit/shared/CodeGenerator-shared.cpp b/jit/shared/CodeGenerator-shared.cpp
--- a/jit/shared/CodeGenerator-shared.cpp
+++ b/jit/shared/CodeGenerator-shared.cpp
@@ -1623,16 +1623,18 @@ size_t
 CodeGeneratorShared::addCacheLocations(const CacheLocationList& locs, size_t* numLocs)
 {
     size_t firstIndex = runtimeData_.length();
     size_t numLocations = 0;
     for (CacheLocationList::iterator iter = locs.begin(); iter != locs.end(); iter++) {
         // allocateData() ensures that sizeof(CacheLocation) is word-aligned.
         // If this changes, we will need to pad to ensure alignment.
         size_t curIndex = allocateData(sizeof(CacheLocation));
+        if (masm.oom())
+            return SIZE_MAX;
         new (&runtimeData_[curIndex]) CacheLocation(iter->pc, iter->script);
         numLocations++;
     }
     MOZ_ASSERT(numLocations != 0);
     *numLocs = numLocations;
     return firstIndex;
 }
 
diff --git a/jit/shared/CodeGenerator-shared.h b/jit/shared/CodeGenerator-shared.h
--- a/jit/shared/CodeGenerator-shared.h
+++ b/jit/shared/CodeGenerator-shared.h
@@ -218,16 +218,18 @@ class CodeGeneratorShared : public LElem
     inline Operand ToOperand(const LDefinition* def);
 
   protected:
     // Ensure the cache is an IonCache while expecting the size of the derived
     // class. We only need the cache list at GC time. Everyone else can just take
     // runtimeData offsets.
     size_t allocateCache(const IonCache&, size_t size) {
         size_t dataOffset = allocateData(size);
+        if (masm.oom())
+            return SIZE_MAX;
         masm.propagateOOM(cacheList_.append(dataOffset));
         return dataOffset;
     }
Flags: needinfo?(rkothari)
Flags: needinfo?(jdemooij)
Flags: needinfo?(bbouvier)
Attachment #8707924 - Flags: review?(jdemooij)
Attachment #8707924 - Flags: approval-mozilla-beta?
Comment on attachment 8707924 [details] [diff] [review]
beta-backout-simpler-approach.patch

Review of attachment 8707924 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks!
Attachment #8707924 - Flags: review?(jdemooij) → review+
(In reply to Benjamin Bouvier [:bbouvier] from comment #31)
> diff --git a/jit/CodeGenerator.cpp b/jit/CodeGenerator.cpp
> --- a/jit/CodeGenerator.cpp
> +++ b/jit/CodeGenerator.cpp
> @@ -8535,16 +8535,18 @@ void
>  CodeGenerator::visitGetPropertyIC(OutOfLineUpdateCache* ool,
> DataPtr<GetPropertyIC>& ic)
>  {
>      […]
>          size_t locationBase = addCacheLocations(cacheLocs, &numLocs);
> +        if (locationBase == SIZE_MAX)
> +            return;
>          ic->setLocationInfo(locationBase, numLocs);

These 2 lines should not be needed at all.  The reason they are not needed is because the masm.oom() flag is set properly and the out of line path only splat the indexes into the code but never uses these indexes to dereference any data.

> diff --git a/jit/shared/CodeGenerator-shared.h
> b/jit/shared/CodeGenerator-shared.h
> --- a/jit/shared/CodeGenerator-shared.h
> +++ b/jit/shared/CodeGenerator-shared.h
> @@ -218,16 +218,18 @@ class CodeGeneratorShared : public LElem
>          size_t dataOffset = allocateData(size);
> +        if (masm.oom())
> +            return SIZE_MAX;
>          masm.propagateOOM(cacheList_.append(dataOffset));

Identically these 2 lines are not needed for the same reason.

Otherwise this sounds good to me.
This is the simple patch version from comment 31 (assuming that the original
patch got backed out), and without the 4 lines on which I commented in
comment 33.

I verified that I was able to reproduce the original issue after reverting
the original pa
tch on top of beta, and I verified that I was unable to
reproduce this assertion after applying the 2 lines from this patch.
This is the simple patch version from comment 31 (assuming that the original
patch got backed out), and without the 4 lines on which I commented in
comment 33.

I verified that I was able to reproduce the original issue after reverting
the original patch on top of beta, and I verified that I was unable to
reproduce this assertion after applying the 2 lines from this patch.

(original patch reviewed by jandem in comment 32)
(backout the original patch applied on beta before cherry-picking this one)
Attachment #8708019 - Flags: review+
(In reply to Nicolas B. Pierron [:nbp] from comment #34)
> Created attachment 8708019 [details] [diff] [review]
> Handle oom in CodeGeneratorShared::addCacheLocations.
> 
> This is the simple patch version from comment 31 (assuming that the original
> patch got backed out), and without the 4 lines on which I commented in
> comment 33.
> 
> I verified that I was able to reproduce the original issue after reverting
> the original patch on top of beta, and I verified that I was unable to
> reproduce this assertion after applying the 2 lines from this patch.
> 
> (original patch reviewed by jandem in comment 32)
> (backout the original patch applied on beta before cherry-picking this one)

^ demangled comment
Comment on attachment 8708019 [details] [diff] [review]
Handle oom in CodeGeneratorShared::addCacheLocations.

Approval Request Comment
(see comment 15)
[Risks and why]: even lower (less changes, less risks)
Attachment #8708019 - Flags: approval-mozilla-beta?
Hi Eric, Nicolas: I need you to fill out the test coverage (did you manually test this?) and risk assessment section on your uplift request. Without that it's tough to make a decision on whether this is a safe patch to take or not. Thanks!
Flags: needinfo?(rkothari)
Flags: needinfo?(nicolas.b.pierron)
Flags: needinfo?(efaustbmo)
Comment on attachment 8708019 [details] [diff] [review]
Handle oom in CodeGeneratorShared::addCacheLocations.

This is a speculative fix that we are taking to see if it addresses the JIT crash spike we are seeing. Depending on how the situation changes, we might be thinking about backing out some patches if this one doesn't stick. Beta44+
Attachment #8708019 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment on attachment 8707924 [details] [diff] [review]
beta-backout-simpler-approach.patch

We decided to go with the other speculative fix as it was much smaller.
Attachment #8707924 - Flags: approval-mozilla-beta?
(In reply to Ritu Kothari (:ritu) from comment #37)
> Hi Eric, Nicolas: I need you to fill out the test coverage (did you manually
> test this?) and risk assessment section on your uplift request. Without that
> it's tough to make a decision on whether this is a safe patch to take or
> not. Thanks!

I am not sure what you are asking about the test coverage, but I verified manually that I was able to reproduce, and verified that the 2 lines changes is indeed fixing the bug (as mentioned in comment 35)
Flags: needinfo?(nicolas.b.pierron)
Flags: needinfo?(efaustbmo)
The fix seems to have worked in 44.0b9, thanks!
Whiteboard: [jsbugmon:update] → [jsbugmon:update][adv-main44+][adv-esr38.6+]
JSBugMon: This bug has been automatically verified fixed on Fx44
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.