Closed Bug 1234147 Opened 8 years ago Closed 8 years ago

crash in js::jit::JitcodeGlobalTable::allocateTower & js::GetOwnPropertyDescriptor

Categories

(Core :: JavaScript Engine: JIT, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla49
Tracking Status
firefox45 --- wontfix
firefox46 --- wontfix
firefox47 + fixed
firefox48 + fixed
firefox49 + fixed
firefox-esr45 47+ fixed

People

(Reporter: Sylvestre, Assigned: shu)

References

Details

(Keywords: crash, csectype-uaf, sec-high, Whiteboard: [adv-main47+][adv-esr45.2+][post-critsmash-triage])

Crash Data

Attachments

(2 files, 2 obsolete files)

This bug was filed from the Socorro interface and is 
report bp-43b8d14d-1e00-4699-a388-4d1852151216 and
report bp-55fff57e-07a8-4f8d-9e78-a839f2151221
=============================================================

Leaving my Firefox during the week end, it is quite common that it is crashed when I am coming back.
Even the signatures are different, I am grouping then as the context in which they appears are similar.

In the terminal, I can see:
console.error: 
  Could not write session state file 
  out of memory
undefined
console.error: _bugzillajs: 
  out of memory
console.error: _bugzillajs: 
  out of memory
console.error: 
  Could not write session state file 
  out of memory
undefined
console.error: 
console.error: 
console.error: 
console.error: 
console.error: 
console.error: 
console.error: 
console.error: 
console.error: 
  out of memory
console.error: 
  out of memory
console.error: 
console.error: 
  out of memory
console.error: 
console.error: _bugzillajs: 
  out of memory
console.error: _bugzillajs: 
  out of memory
console.error: _bugzillajs: 
  out of memory
console.error: _bugzillajs: 
  out of memory
Does disabling ion prevent this from occurring?
I don't know, I didn't look if this is still happening.
Has this issue been officially confirmed?  I'm encountering it on Firefox 43.0.1 and attempting to check 45.
I can confirm its presence in 45.0 (windows).

The issue is repeatable and encountered when Firefox is open for a prolonged amount of time several days / week.

Am I correct in my interpretation that this issue is related to an elevation in the JavaScript optimization level?  I am attempting to test by disabling basline, ion, and asmjs.

Any ideas on causes or mitigating factors shy of restarting Firefox?
Randell Jesup [:jesup] caught this crash after "Running central for a week with 2000+ tabs on linux64 (maybe 100 or 150 loaded)".

#0  0x00007f27d1365222 in js::jit::JitcodeGlobalTable::allocateTower(unsigned int) (freeList=0x7f27be43ea48)
    at ../../../js/src/jit/JitcodeMap.h:101
#1  0x00007f27d1365222 in js::jit::JitcodeGlobalTable::allocateTower(unsigned int) (this=this@entry=0x7f27be43e800, height=33) at /home/jesup/src/mozilla/head/js/src/jit/JitcodeMap.cpp:685

It's crashing because height=33 in frame #1, and JitcodeSkiplistTower::MAX_HEIGHT is 32.

I'll attach more info from Jesup as an attachment.
Flags: needinfo?(shu)
I'm restricting this as a sec bug since my crash shows an 0xe5e5e5e5 signature (UAF).  Likely either sec-crit or sec-high; I tend to a sec-crit for a UAF in the jit code with unclear origins.

Note: linux64, opt build.  I get occasional crashes of the Content process, and with a local build it's not useful to look at crash-stats/about:crashes, so I tend to leave GDB running on the content process, which is how I caught it.  

I'm attaching a more-complete log of my gdb session, with the changeset ID it was built from.

NI dveditz for sec level assessment.
Group: core-security
Flags: needinfo?(dveditz)
Keywords: csectype-uaf
Attached file head_crash gdb logs
Attachment #8745013 - Attachment is obsolete: true
Since this seems to take a long time to happen and we don't know of a particular way to trigger it at will it's not sec-critical, but sec-high seems like a reasonable worry level.
Flags: needinfo?(dveditz)
Keywords: sec-high
FYI: ~450 crashes in the last month with this signature.

A number of them show e5e5 addresses.  Crashes go back to at least 42.0.  Some have slightly different calling stacks.

Note looking at only maybe 10 I saw an uptime of 1140, so it's likely hittable fairly fast.
Flags: needinfo?(shu)
Attachment #8745528 - Flags: review?(kvijayan)
Attachment #8745528 - Flags: review?(kvijayan) → review+
Comment on attachment 8745528 [details] [diff] [review]
Fix height computation for JitcodeSkiplistTower.

[Security approval request comment]
> How easily could an exploit be constructed based on the patch?

It depends on a PRNG returning 0. Seems pretty hard.

> Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?

Yeah, it makes it pretty obvious that it'll generate an out-of-bounds index.

> Which older supported branches are affected by this flaw?

I heard someone say 42.

> If not all supported branches, which bug introduced the flaw?

>  Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?

The same patch should apply.

> How likely is this patch to cause regressions; how much testing does it need?

Not likely.
Attachment #8745528 - Flags: sec-approval?
(In reply to Shu-yu Guo [:shu] from comment #12)
> Comment on attachment 8745528 [details] [diff] [review]
> Fix height computation for JitcodeSkiplistTower.
> 
> [Security approval request comment]
> > How easily could an exploit be constructed based on the patch?
> 
> It depends on a PRNG returning 0. Seems pretty hard.

So I believe this would cause an out-of-bounds index; however this is happening way too many times for me to think it's a PRNG returning 0 causing it unless it's a) a 16-bit PRNG, or b) really, really badly seeded, or c) being called a truly stupendous number of times (possible, but I hope we're not).

If it's none of those, then there's another source of the problem as well. (I hope not!)
(In reply to Randell Jesup [:jesup] from comment #13)
> (In reply to Shu-yu Guo [:shu] from comment #12)
> > Comment on attachment 8745528 [details] [diff] [review]
> > Fix height computation for JitcodeSkiplistTower.
> > 
> > [Security approval request comment]
> > > How easily could an exploit be constructed based on the patch?
> > 
> > It depends on a PRNG returning 0. Seems pretty hard.
> 
> So I believe this would cause an out-of-bounds index; however this is
> happening way too many times for me to think it's a PRNG returning 0 causing
> it unless it's a) a 16-bit PRNG, or b) really, really badly seeded, or c)
> being called a truly stupendous number of times (possible, but I hope we're
> not).
> 

Good points. The PRNG itself is some xorshift-looking thing with a citation: https://dxr.mozilla.org/mozilla-central/source/js/src/jit/JitcodeMap.cpp#665-670

a) rand_ is a uint32_t
b) It's seeded as 0, this seems independently bad
c) I suppose it could be. I'll compile a local thing and leave it running a bit and see how many iterations until it becomes 0
shu@pikashu ~ 
% cat test.cpp
#include <stdio.h>
#include <stdint.h>
#include <limits.h>

template<typename T>
inline T
RotateLeft(const T aValue, uint_fast8_t aShift)
{
  return (aValue << aShift) | (aValue >> (sizeof(T) * CHAR_BIT - aShift));
}

int main() {
    uint32_t rand_ = 0;
    uint64_t iters = 0;
    do {
        rand_ ^= RotateLeft(rand_, 5) ^ RotateLeft(rand_, 24);
        rand_ += 0x37798849;
        iters++;
    } while (rand_ != 0);

    fprintf(stdout, "\ntook %zu iters\n", iters);
    return 0;
}
                                                                                                                                                                                     
shu@pikashu ~ 
% ./a.out 

took 574295818 iters



Welp.
So that leaves "we're doing this millions of times per run", or it's something else.  Perhaps drop a quick static counter and dump a printf every 100,000 hits, then browse?  If you don't get any/many, it's not hitting 0 by luck (or there's another cause of this failure).
Wouldn't that patch always return 0 or 1 from generateTowerHeight?

I'm new to the codebase, but it looks like to me that generateTowerHeight should ultimately return a value between 1 and 32?

Wouldn't the solution be:

return std::min(JitcodeSkiplistTower::MAX_HEIGHT, std::max(1U, result+1)
(In reply to Chad Dotson from comment #17)
> Wouldn't that patch always return 0 or 1 from generateTowerHeight?
> 
> I'm new to the codebase, but it looks like to me that generateTowerHeight
> should ultimately return a value between 1 and 32?
> 
> Wouldn't the solution be:
> 
> return std::min(JitcodeSkiplistTower::MAX_HEIGHT, std::max(1U, result+1)

Oh oops, yes you're right, not being very bright today. It should be a max(1, result), not min(1, result).

The clipping at MAX_HEIGHT is controlled by the bit-counting loop itself, though that's not very clear.
(In reply to Randell Jesup [:jesup] from comment #16)
> So that leaves "we're doing this millions of times per run", or it's
> something else.  Perhaps drop a quick static counter and dump a printf every
> 100,000 hits, then browse?  If you don't get any/many, it's not hitting 0 by
> luck (or there's another cause of this failure).

Yes, I'll try that. Chad, do you have Gecko Profiler Addon installed?
Flags: needinfo?(chad)
Comment on attachment 8745528 [details] [diff] [review]
Fix height computation for JitcodeSkiplistTower.

Review of attachment 8745528 [details] [diff] [review]:
-----------------------------------------------------------------

Wouldn't this always return 0 or 1 from generateTowerHeight?

What about this instead from generateTowerHeight:

return std::min(JitcodeSkiplistTower::MAX_HEIGHT, std::max(1U, result+1)
(In reply to Shu-yu Guo [:shu] from comment #19)
> (In reply to Randell Jesup [:jesup] from comment #16)
> > So that leaves "we're doing this millions of times per run", or it's
> > something else.  Perhaps drop a quick static counter and dump a printf every
> > 100,000 hits, then browse?  If you don't get any/many, it's not hitting 0 by
> > luck (or there's another cause of this failure).
> 
> Yes, I'll try that. Chad, do you have Gecko Profiler Addon installed?

No unfortunately.
I threw the printf in.  Loading huffingtonpost into a tab, and letting it sit there (debug build) hits the generator 100000 times every few minutes - with one, idle (though fancy) page loaded.  Wow.....  Still not sure, but this may mean it could hit 0's occasionally.

This also says "this is a potential perf hotspot".
(In reply to Randell Jesup [:jesup] from comment #22)
> I threw the printf in.  Loading huffingtonpost into a tab, and letting it
> sit there (debug build) hits the generator 100000 times every few minutes -
> with one, idle (though fancy) page loaded.  Wow.....  Still not sure, but
> this may mean it could hit 0's occasionally.
> 
> This also says "this is a potential perf hotspot".

Yeah this is very surprising to me, since that code, IIRC, shouldn't be hot at all if profiling is turned off. Let me try to page this stuff back in...
NIing Kannan since he designed this.

Any ideas on how to make this less hot? The comment suggests that we allocate a JitcodeGlobalEntry for every baseline compile because we can't do this post-hoc for on-stack scripts.

I'm actually not sure why that is. When we flip on profiling, can't we just take a list of all the on-stack scripts and make JitcodeGlobalEntries for them then?
Flags: needinfo?(chad) → needinfo?(kvijayan)
(In reply to Randell Jesup [:jesup] from comment #22)
> I threw the printf in.  Loading huffingtonpost into a tab, and letting it
> sit there (debug build) hits the generator 100000 times every few minutes -
> with one, idle (though fancy) page loaded.  Wow.....  Still not sure, but
> this may mean it could hit 0's occasionally.
> 
> This also says "this is a potential perf hotspot".

The tests I've run have included little human interaction after initial page load.  Though the content does continue to update.
Comment on attachment 8745528 [details] [diff] [review]
Fix height computation for JitcodeSkiplistTower.

Clearing sec-approval? while y'all debate what's going on. Please set it to ? again when you have a final patch for sure.
Attachment #8745528 - Flags: sec-approval?
(In reply to Shu-yu Guo [:shu] from comment #24)
> NIing Kannan since he designed this.
> 
> Any ideas on how to make this less hot? The comment suggests that we
> allocate a JitcodeGlobalEntry for every baseline compile because we can't do
> this post-hoc for on-stack scripts.
> 
> I'm actually not sure why that is. When we flip on profiling, can't we just
> take a list of all the on-stack scripts and make JitcodeGlobalEntries for
> them then?

This is a reasonable alternative approach.  I didn't think it would be necessary, though since we don't expect script compilation to happen a ton.  Is it Ion IC generation that's causing it?  Or baseline compilation?

Fundamentally, compilation should be "rare", and this skiplist generation code should be very cheap compared to compiling ANYTHING.  So the reasoning was that it would be a small delta compared to the cost of compilation.  If that reasoning is off, it makes sense to implement the alternative you have suggested.

The key requirement is: there should exist entries in the map for every piece of jitcode before the first sample is taken.
Flags: needinfo?(kvijayan)
Guys - this is a sec-high UAF - we need to land a fix ASAP and uplift to aurora/beta and probably to ESR45.  While I really want to understand/fix if it's an indicator we're doing something too often (perf is important!) that should NOT block landing a fix for the crash/security fix.  I keep my local build sitting in GDB (just in case, since crash-stats reports are worthless for local builds), and all my local crashes (every few days-ish) are this.

The answer to the open question of "did we analyze what caused this correctly" seems to be yes, and it's being called a lot more often than we anticipated.
Flags: needinfo?(shu)
BTW, I verified that when I crash with this, JtcodeGlobalTable::rand_ is 0
Oops, sorry I forgot to land this! I will land a fix now and worry about the performance later.
Flags: needinfo?(shu)
Or rather, I will re-request sec-approval for the fix now and worry about performance in the meantime.
Carrying r=djvj with min/max typo fixed.
Attachment #8745528 - Attachment is obsolete: true
Attachment #8747597 - Flags: review+
Comment on attachment 8747597 [details] [diff] [review]
Fix height computation for JitcodeSkiplistTower.

We need to land a security fix; perf problems will be looked at later.


[Security approval request comment]
How easily could an exploit be constructed based on the patch?

It's clear there's a bounds issue with returning 0. The patch itself doesn't make it clear how to get the PRNG to return 0---which we know happens with running it enough times---but isn't mentioned by the code.

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?

On the nature of the problem, but again, not on the fact that the PRNG can return 0 if run for long enough.

Which older supported branches are affected by this flaw?

All branches I think, this landed a while ago.

If not all supported branches, which bug introduced the flaw?

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?

Should not need different backport.

How likely is this patch to cause regressions; how much testing does it need?

Will not cause regressions.
Attachment #8747597 - Flags: sec-approval?
Group: core-security → javascript-core-security
Comment on attachment 8747597 [details] [diff] [review]
Fix height computation for JitcodeSkiplistTower.

sec-approval+.

We should backport this to branches, including ESR45.
Attachment #8747597 - Flags: sec-approval? → sec-approval+
Assuming fix is accepted, what is the general timeline for it appearing in the different builds?
Setting checkin-needed until my SSH key is updated.
Keywords: checkin-needed
this caused a bustage and had to be backed out -> https://treeherder.mozilla.org/logviewer.html#?job_id=27439259&repo=mozilla-inbound
Flags: needinfo?(shu)
Looks like a min/max name collision from windows.h.
just wrap std::max with parens.

return (std::max)(1U, result);
Flags: needinfo?(shu)
https://hg.mozilla.org/mozilla-central/rev/6a30c1484e91
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla49
Group: javascript-core-security → core-security-release
Hello Shu, could you please nominate patches for uplift to Beta47, Aurora48, ESR45? Thanks!
Flags: needinfo?(shu)
FYI, I've had no crashes on mozilla-central (with 2000+ tabs, 100+ loaded) in ~ two weeks with this patch
Comment on attachment 8747597 [details] [diff] [review]
Fix height computation for JitcodeSkiplistTower.

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration:
User impact if declined: UAF crash when leaving browser open for a long time 
Fix Landed on Version: 49?
Risk to taking this patch (and alternatives if risky): I don't know
String or UUID changes made by this patch: None

See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.

Approval Request Comment
[Feature/regressing bug #]: 1004831
[User impact if declined]: UAF crash if leaving browser open for a long time
[Describe test coverage new/current, TreeHerder]: on m-c
[Risks and why]: I don't know
[String/UUID change made/needed]: None
Flags: needinfo?(shu)
Attachment #8747597 - Flags: approval-mozilla-esr45?
Attachment #8747597 - Flags: approval-mozilla-beta?
Attachment #8747597 - Flags: approval-mozilla-aurora?
Comment on attachment 8747597 [details] [diff] [review]
Fix height computation for JitcodeSkiplistTower.

Sec-high, Jesup cannot repro crashes after the fix (thanks for testing!), Aurora48+, Beta47+, ESR45+
Attachment #8747597 - Flags: approval-mozilla-esr45?
Attachment #8747597 - Flags: approval-mozilla-esr45+
Attachment #8747597 - Flags: approval-mozilla-beta?
Attachment #8747597 - Flags: approval-mozilla-beta+
Attachment #8747597 - Flags: approval-mozilla-aurora?
Attachment #8747597 - Flags: approval-mozilla-aurora+
Whiteboard: [adv-main47+][adv-esr45.2+]
(In reply to Kannan Vijayan [:djvj] from comment #27)
> (In reply to Shu-yu Guo [:shu] from comment #24)
> > NIing Kannan since he designed this.
> > 
> > Any ideas on how to make this less hot? The comment suggests that we
> > allocate a JitcodeGlobalEntry for every baseline compile because we can't do
> > this post-hoc for on-stack scripts.
> > 
> > I'm actually not sure why that is. When we flip on profiling, can't we just
> > take a list of all the on-stack scripts and make JitcodeGlobalEntries for
> > them then?
> 
> This is a reasonable alternative approach.  I didn't think it would be
> necessary, though since we don't expect script compilation to happen a ton. 
> Is it Ion IC generation that's causing it?  Or baseline compilation?
> 
> Fundamentally, compilation should be "rare", and this skiplist generation
> code should be very cheap compared to compiling ANYTHING.  So the reasoning
> was that it would be a small delta compared to the cost of compilation.  If
> that reasoning is off, it makes sense to implement the alternative you have
> suggested.
> 
> The key requirement is: there should exist entries in the map for every
> piece of jitcode before the first sample is taken.

I suspect you should file a followup to find out if this is being called too much, and if so why.
Flags: needinfo?(kvijayan)
Filed bug 1276261 for this.
Flags: needinfo?(kvijayan)
Whiteboard: [adv-main47+][adv-esr45.2+] → [adv-main47+][adv-esr45.2+][post-critsmash-triage]
Group: core-security-release
Depends on: 1351947
You need to log in before you can comment on or make changes to this bug.