Closed Bug 1013056 Opened 5 years ago Closed 5 years ago

Crash at a weird memory address involving RegExp and filterPar

Categories

(Core :: JavaScript Engine, defect, critical)

ARM
Linux
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla33
Tracking Status
firefox30 --- wontfix
firefox31 --- verified
firefox32 --- verified
firefox33 --- verified
firefox-esr24 --- unaffected
b2g-v1.3 --- fixed
b2g-v1.3T --- fixed
b2g-v1.4 --- fixed
b2g-v2.0 --- fixed
b2g-v2.1 --- fixed

People

(Reporter: gkw, Assigned: mjrosenb)

References

(Blocks 1 open bug)

Details

(4 keywords, Whiteboard: [jsbugmon:][adv-main31+])

Attachments

(3 files, 4 obsolete files)

Attached file stack (obsolete) —
b = RegExp("(?=())")
y = new Object;
a = b.exec('')
a[3] = undefined
Array.prototype.unshift.call(a, undefined)
try {
    a.filterPar(function() {})
} catch (e) {}

crashes js opt shell on m-c changeset 2893f60d5903 with --ion-parallel-compile=off --ion-eager at a weird memory address intermittently.

Setting s-s and assuming sec-critical prior to further analysis, as crashing at a weird address seems bad.

My configure flags are:

CC="gcc-4.7 -mfloat-abi=softfp -B/usr/lib/gcc/arm-linux-gnueabi/4.7" CXX="g++-4.7 -mfloat-abi=softfp -B/usr/lib/gcc/arm-linux-gnueabi/4.7" AR=ar sh /home/fuzz5lin/trees/mozilla-central/js/src/configure --target=arm-linux-gnueabi --enable-optimize --disable-debug --enable-profiling --enable-gczeal --enable-debug-symbols --disable-tests --enable-more-deterministic --with-ccache --enable-threadsafe <other NSPR options>

Run this anytime from 1-100 times, or (set disable-randomization if necessary) within gdb, repeatedly run it till it crashes.

I will try to see if I can get a bisection, but no promises yet.

Setting needinfo? from Marty to see what's going on here.
Flags: needinfo?(mrosenberg)
autoBisect shows this is probably related to the following changeset:

The first bad revision is:
changeset:   http://hg.mozilla.org/mozilla-central/rev/7a1c696cade6
user:        Shu-yu Guo
date:        Mon Apr 07 13:02:20 2014 -0700
summary:     Bug 974201 - Remove filterPar chunking. (r=nmatsakis)

Shu-yu, is bug 974201 likely related in any way?
Blocks: 974201
Flags: needinfo?(shu)
Summary: Crash at a weird memory address involving RegExp → Crash at a weird memory address involving RegExp and filterPar
I don't have any ARM hardware to try to reproduce this. Did this start after irregexp landed?
(In reply to Shu-yu Guo [:shu] from comment #2)
> I don't have any ARM hardware to try to reproduce this. Did this start after
> irregexp landed?

That is, can you reproduce with YARR?
I am unable to reproduce this with the ARM simulator after running the test in a loop for about half an hour.
(In reply to Shu-yu Guo [:shu] from comment #2)
> I don't have any ARM hardware to try to reproduce this. Did this start after
> irregexp landed?

The bisection says bug 974201 is where it starts happening, and that was before irregexp landed.
Clearing NI because I have no idea how to reproduce this. Gary, please re-NI me if you come up with a machine I can ssh into or something.
Flags: needinfo?(shu)
(Shu-yu and I are trying to sort this out asap)
Whiteboard: [jsbugmon:update]
Whiteboard: [jsbugmon:update] → [jsbugmon:]
JSBugMon: Cannot process bug: Unable to automatically reproduce, please track manually.
Critsmash check: needs a recheck.
Flags: needinfo?(gary)
This still reproduces with m-c rev 4876594bacaa.
Attachment #8425230 - Attachment is obsolete: true
Flags: needinfo?(gary)
(In reply to Shu-yu Guo [:shu] from comment #6)
> Clearing NI because I have no idea how to reproduce this. Gary, please re-NI
> me if you come up with a machine I can ssh into or something.

Shu-yu, I now have a machine with access and an updated rev - let's sync up on IRC sometime soon.
Flags: needinfo?(shu)
Shu-yu handed this off to Marty, who took a look at it on my machine. Any discoveries?
Flags: needinfo?(shu)
; so here isthe code that was likely executed right before the code that crashed:
 0xb52feec4:  beq     0xb52feedc
   0xb52feec8:  cmn     r3, #123        ; 0x7b
   0xb52feecc:  beq     0xb52feedc
   0xb52feed0:  ldr     r9, [r9, #4]
   0xb52feed4:  ldr     r0, [r9]
   0xb52feed8:  bx      r0
   0xb52feedc:  mov     pc, lr
   0xb52feee0:  andeq   r0, r0, r0
   0xb52feee4:  andeq   r0, r0, r0
   0xb52feee8:  andeq   r0, r0, r0
   0xb52feeec:  ldrlt   r0, [r4, #-2936]!       ; 0xb78
   0xb52feef0:  ldr     lr, [r9]
   0xb52feef4:  movw    r12, #59424     ; 0xe820
   0xb52feef8:  movt    r12, #46387     ; 0xb533
   0xb52feefc:  cmp     lr, r12
   0xb52fef00:  bne     0xb52fef44
   0xb52fef04:  cmn     r10, #127       ; 0x7f
   0xb52fef08:  bne     0xb52fef44
   0xb52fef0c:  mov     r11, r3
   0xb52fef10:  push    {r9}            ; (str r9, [sp, #-4]!)
   0xb52fef14:  ldr     r9, [r9, #12]
   0xb52fef18:  ldr     r12, [r9, #-12]
   0xb52fef1c:  cmp     r12, r11
   0xb52fef20:  bls     0xb52fef40
   0xb52fef24:  add     r12, r9, r11, lsl #3
   0xb52fef28:  ldr     r11, [r12]
   0xb52fef2c:  ldr     r0, [r12, #4]
   0xb52fef30:  cmn     r0, #124        ; 0x7c
   0xb52fef34:  beq     0xb52fef40
   0xb52fef38:  pop     {r9}            ; (ldr r9, [sp], #4)
   0xb52fef3c:  b       0xb52ee224
   0xb52fef40:  pop     {r9}            ; (ldr r9, [sp], #4)
   0xb52fef44:  b       0xb52ee8a4
   0xb52fef48:  b       0xb52fef58
   0xb52fef4c:                  ; <UNDEFINED> instruction: 0xffff0003
   0xb52fef50:  cdple   14, 10, cr11, cr13, cr15, {7}
   0xb52fef54:  cdple   14, 10, cr11, cr13, cr15, {7}
   0xb52fef58:  andeq   r0, r0, r8
   0xb52fef5c:  andeq   r0, r0, r0
   0xb52fef60:  andeq   r0, r0, r0
   0xb52fef64:  andeq   r0, r0, r0
   0xb52fef68:  andeq   r0, r0, r0
   0xb52fef6c:  andeq   r0, r0, r0
---Type <return> to continue, or q <return> to quit---
   0xb52fef70:  andeq   r0, r0, r0
   0xb52fef74:  andeq   r0, r0, r0
   0xb52fef78:  andeq   r0, r0, r0
   0xb52fef7c:  andeq   r0, r0, r0
   0xb52fef80:  andeq   r0, r0, r0
   0xb52fef84:  andeq   r0, r0, r0
   0xb52fef88:  andeq   r0, r0, r0
   0xb52fef8c:  andeq   r0, r0, r0
   0xb52fef90:  andeq   r0, r0, r0
   0xb52fef94:  andeq   r0, r0, r0
   0xb52fef98:  andeq   r0, r0, r0
   0xb52fef9c:  andeq   r0, r0, r0
   0xb52fefa0:  andeq   r0, r0, r0
   0xb52fefa4:  andeq   r0, r0, r0
   0xb52fefa8:  andeq   r0, r0, r0
   0xb52fefac:  andeq   r0, r0, r0
   0xb52fefb0:  andeq   r0, r0, r0
   0xb52fefb4:  andeq   r0, r0, r0
   0xb52fefb8:  andeq   r0, r0, r0
   0xb52fefbc:  andeq   r0, r0, r0
   0xb52fefc0:  andeq   r0, r0, r0
   0xb52fefc4:  andeq   r0, r0, r0
   0xb52fefc8:  andeq   r0, r0, r0
   0xb52fefcc:  andeq   r0, r0, r0
   0xb52fefd0:  andeq   r0, r0, r0
   0xb52fefd4:  andeq   r0, r0, r0
   0xb52fefd8:  andeq   r0, r0, r0
   0xb52fefdc:  andeq   r0, r0, r0
   0xb52fefe0:  andeq   r0, r0, r0
   0xb52fefe4:  andeq   r0, r0, r0
   0xb52fefe8:  andeq   r0, r0, r0
   0xb52fefec:  andeq   r0, r0, r0
   0xb52feff0:  andeq   r0, r0, r0
   0xb52feff4:  andeq   r0, r0, r0
   0xb52feff8:  andeq   r0, r0, r0
   0xb52feffc:  andeq   r0, r0, r0

the tldr is normal code; normal code; branch around pool; goose eggs.
My current guesses are 1) for whatever reason, we got here and the copying never finished.
or 2) the correct code was stomped over with all zeros, which are a pretty nice nop.  I'll look into this more on monday.
Flags: needinfo?(mrosenberg)
(In reply to Marty Rosenberg [:mjrosenb] from comment #13)
> the tldr is normal code; normal code; branch around pool; goose eggs.
> My current guesses are 1) for whatever reason, we got here and the copying
> never finished.
> or 2) the correct code was stomped over with all zeros, which are a pretty
> nice nop.  I'll look into this more on monday.

Thanks Marty! Assigning to you...
Assignee: nobody → mrosenberg
Status: NEW → ASSIGNED
Marty, when do you think this will get in?
Flags: needinfo?(mrosenberg)
Flags: needinfo?(mrosenberg)
interesting that adding a cc cleared needinfo.
After lots of debugging, I now suspect that the issue is a lack of cache flushing.
Changing the code to always flush the caches immediately seems to make the issue go away (currently in the middle of running it 1,000 times to verify).  gary, can you possibly test this with a simulator build, and ARM_SIM_ICACHE_CHECKS=1 (it is an environment variable)?

At this point, I can just munge the auto flush cache code until this issue goes away, or I can attempt to actually figure out what is going wrong beyond "caches aren't flushed."  I'm leaning towards the latter, but the former is certainly doable, if we want this to go away asap.
Flags: needinfo?(gary)
(In reply to Marty Rosenberg [:mjrosenb] from comment #16)
> gary, can you possibly test this with a simulator build, and
> ARM_SIM_ICACHE_CHECKS=1 (it is an environment variable)?

No, this does not reproduce with a simulator build with or without that environment variable. (I ran about 500 times) Tested on m-c rev da1dbcff9493.
Flags: needinfo?(gary)
Attached patch fixCacheFlusher-r0.patch (obsolete) — Splinter Review
dougc for the refactoring of the AutoFlushICache.  I didn't see any way to make the current flusher flush its contents.  I suspect this may not actually be needed, but it certainly seems simpler than adding in a new AFIC for ever call between the IC top level, and here where code can be modified.

shu for the actual location of forceFlush.  I perused the code, and I *think* that is the only place where it is needed.  I ran the test 100 times with this in place, and didn't see any crashes.
Attachment #8446176 - Flags: review?(shu)
Attachment #8446176 - Flags: review?(dtc-moz)
Comment on attachment 8446176 [details] [diff] [review]
fixCacheFlusher-r0.patch

Review of attachment 8446176 [details] [diff] [review]:
-----------------------------------------------------------------

Great work, Marty.

::: js/src/jit/IonCaches.cpp
@@ +334,5 @@
>          // have not yet executed the code we're patching the jump in.
>          nextStubOffset_.fixup(&masm);
>          CodeLocationJump nextStubJump(code, nextStubOffset_);
>          PatchJump(nextStubJump, CodeLocationLabel(cache_.firstStub_));
> +        AutoFlushICache::ForceFlush();

Nit: keep the newline after the ForceFlush

::: js/src/jit/IonCode.h
@@ +787,5 @@
>  
>    public:
>      static void setRange(uintptr_t p, size_t len);
>      static void flush(uintptr_t p, size_t len);
> +    static void ForceFlush();

Does SM capitalize static methods?
Attachment #8446176 - Flags: review?(shu) → review+
Comment on attachment 8446176 [details] [diff] [review]
fixCacheFlusher-r0.patch

Review of attachment 8446176 [details] [diff] [review]:
-----------------------------------------------------------------

There should be no forcing of the instruction cache flushing because the region pending flushing should never be executed. Flushing that occurs outside the pending region occurs immediately. The PatchJump should be flushing the cache immediately, unless it is in a pending region in which case it should not be executed until after the AutoFlushICache destructor is called. The patch might be a good clue, but the problem needs to be analysed first and there might be some other issue causing problems here.
Attachment #8446176 - Flags: review?(dtc-moz) → review-
Comment on attachment 8446176 [details] [diff] [review]
fixCacheFlusher-r0.patch

Review of attachment 8446176 [details] [diff] [review]:
-----------------------------------------------------------------

In IonCache::attachStub, |attacher.patchStubCodePointer(masm, code);| should be called before |attacher.patchNextStubJump(masm, code);| too, since patchNextStubJump is what makes the new stub reachable.
Here's part of the problem. I would prefer to fix it this way, to keep the code as simple as possible. Strange that the ARM simulator did not pick this up, maybe because it's a race.

Shu: could you combine this with the changes you advised in comment 21?
Flags: needinfo?(shu)
Sorry, I created this problem in Bug 988789.
Blocks: 988789
(In reply to Douglas Crosher [:dougc] from comment #23)
> Sorry, I created this problem in Bug 988789.

Updating flags - bug 988789 landed in 30, 31 and 32, but too late for 30. Thanks for fixing this!
No longer blocks: 974201
Add the changes advised in comment 21, and add some comments.
Attachment #8446272 - Attachment is obsolete: true
Flags: needinfo?(shu)
Are you able to verify that the patch in comment 25 corrects this issue?
Flags: needinfo?(gary)
I'm behind on stuff, but nonetheless this is on my list.
(In reply to Douglas Crosher [:dougc] from comment #26)
> Are you able to verify that the patch in comment 25 corrects this issue?

Yes, without the patch, it crashes within 40-50 runs most of the time. However, with the patch, it did not crash after 400-odd runs.
Flags: needinfo?(gary) → needinfo?(dtc-moz)
Attachment #8446280 - Flags: review?(shu)
Flags: needinfo?(dtc-moz)
Attachment #8446176 - Attachment is obsolete: true
Comment on attachment 8446280 [details] [diff] [review]
Exit an AutoFlushICache dynamic context to ensure the cache has been flushed before the code is executed.

Review of attachment 8446280 [details] [diff] [review]:
-----------------------------------------------------------------

Clean fix. Thanks!

::: js/src/jit/IonCaches.cpp
@@ +393,5 @@
>      // as it can be kept alive even if the cache is flushed (see
>      // MarkJitExitFrame).
>      attacher.patchStubCodePointer(masm, code);
> +
> +    // Update the failure path.

Could you add a comment for posterity that this patch is what makes the stub accessible for parallel ICs, and should not be moved unless the author really knows what's going on?
Attachment #8446280 - Flags: review?(shu) → review+
Address reviewer feedback. Carrying forward r+.
Attachment #8446280 - Attachment is obsolete: true
Attachment #8452036 - Flags: review+
Comment on attachment 8452036 [details] [diff] [review]
Exit an AutoFlushICache dynamic context to ensure the cache has been flushed before the code is executed.

[Security approval request comment]
How easily could an exploit be constructed based on the patch? Difficult, it's a thread race issue.

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem? For someone looking, yes.

Which older supported branches are affected by this flaw? Release 30, Beta 21, Aurora 32.

If not all supported branches, which bug introduced the flaw? bug 988789.

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be? The same patch applies to beta 31 and aurora 32.

How likely is this patch to cause regressions; how much testing does it need? It was hard to reproduce, and has been tested to resolve the issue, see comment 28.
Attachment #8452036 - Flags: sec-approval?
Comment on attachment 8452036 [details] [diff] [review]
Exit an AutoFlushICache dynamic context to ensure the cache has been flushed before the code is executed.

Approval Request Comment
[Feature/regressing bug #]: bug 988789.
[User impact if declined]: Intermittent crashes in some JS code.
[Describe test coverage new/current, TBPL]: Tested locally, tbpl try builds.
[Risks and why]: Low. It was an obvious programming mistake. The problem has been reproduced, and the patch tested to resolve the issue.
[String/UUID change made/needed]: n/a
Attachment #8452036 - Flags: approval-mozilla-beta?
Attachment #8452036 - Flags: approval-mozilla-aurora?
Comment on attachment 8452036 [details] [diff] [review]
Exit an AutoFlushICache dynamic context to ensure the cache has been flushed before the code is executed.

sec-approval+ for trunk. I approved Aurora and Branch for after trunk is green.

Does this affect ESR24?
Attachment #8452036 - Flags: sec-approval?
Attachment #8452036 - Flags: sec-approval+
Attachment #8452036 - Flags: approval-mozilla-beta?
Attachment #8452036 - Flags: approval-mozilla-beta+
Attachment #8452036 - Flags: approval-mozilla-aurora?
Attachment #8452036 - Flags: approval-mozilla-aurora+
Landed after obfuscating commit message:

https://hg.mozilla.org/integration/mozilla-inbound/rev/5c805d803e11

(and no, looking at bug 988789, ESR24 should not be affected.)
Target Milestone: --- → mozilla33
Looking at this further, part of the fix addresses an issue introduced in bug 849469 which landed on 22. This patch includes only the part of the patch that needs backporting to 29 to 22 inclusive. This patch is relative to m-c but should be easy to backport if there is any interest.
Changing ESR24 to be "affected" then, based on comment 36.
Might as well request esr24 approval on the patch then.
Comment on attachment 8452807 [details] [diff] [review]
Sub-patch for backporting to ESR 24.

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration: It is thread race with a simple fix.
User impact if declined: Low. Crashes in some rare JS code. No attempt has been made to reproduce a crash for this issue alone and it might be hard to exploit.
Fix Landed on Version: 33
Risk to taking this patch (and alternatives if risky): Low because it's a small obvious fix.
String or UUID changes made by this patch: n/a

shu: could you double check this?
Attachment #8452807 - Flags: feedback?(shu)
Attachment #8452807 - Flags: approval-mozilla-esr24?
Comment on attachment 8452807 [details] [diff] [review]
Sub-patch for backporting to ESR 24.

Review of attachment 8452807 [details] [diff] [review]:
-----------------------------------------------------------------

You don't need to backport this. PJS isn't turned on anywhere except on Nightly, for precisely this reason of not wanting to have to backport.
Attachment #8452807 - Flags: feedback?(shu)
Attachment #8452807 - Flags: approval-mozilla-esr24?
https://hg.mozilla.org/mozilla-central/rev/5c805d803e11
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Whiteboard: [jsbugmon:] → [jsbugmon:][adv-main31+]
Group: javascript-core-security
Gary, can we assume that this was verified on Fx31 based on comment 28?
Flags: needinfo?(gary)
(In reply to Matt Wobensmith from comment #43)
> Gary, can we assume that this was verified on Fx31 based on comment 28?

I just retested that the intermittent crash does not occur on mozilla-beta tip rev 6befadcaa685 but occurs on mozilla-beta regressor rev 7a1c696cade6, so yes, verified.
Status: RESOLVED → VERIFIED
Flags: needinfo?(gary)
Group: core-security → core-security-release
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.