Closed Bug 1272523 Opened 8 years ago Closed 8 years ago

Assertion failure: args[0].isString(), at js/src/builtin/Intl.cpp:835

Categories

(Core :: JavaScript Engine, defect)

42 Branch
x86
All
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla49
Tracking Status
firefox47 --- wontfix
firefox48 --- disabled
firefox49 - fixed
firefox-esr45 --- disabled

People

(Reporter: gkw, Assigned: arai)

References

Details

(Keywords: assertion, sec-high, testcase, Whiteboard: [jsbugmon:])

Attachments

(3 files, 2 obsolete files)

The following testcase crashes on mozilla-central revision c3f5e6079284 (build with --enable-debug --enable-more-deterministic --32, run with --fuzzing-safe --no-threads --ion-eager --gc-zeal=14 --no-fpu --ion-extra-checks):

See attachment.

Backtrace:

0   js-dbg-32-dm-clang-darwin-c3f5e6079284	0x00605f02 js::intl_availableCollations(JSContext*, unsigned int, JS::Value*) + 1282 (Intl.cpp:835)
1   js-dbg-32-dm-clang-darwin-c3f5e6079284	0x0098f42c js::CallJSNative(JSContext*, bool (*)(JSContext*, unsigned int, JS::Value*), JS::CallArgs const&) + 204 (jscntxtinlines.h:236)
2   js-dbg-32-dm-clang-darwin-c3f5e6079284	0x0097fa4c js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct) + 764 (Interpreter.cpp:468)
3   js-dbg-32-dm-clang-darwin-c3f5e6079284	0x0097fe0c InternalCall(JSContext*, js::AnyInvokeArgs const&) + 156 (Interpreter.cpp:525)
4   js-dbg-32-dm-clang-darwin-c3f5e6079284	0x0097fd61 js::CallFromStack(JSContext*, JS::CallArgs const&) + 17 (Interpreter.cpp:531)
5   js-dbg-32-dm-clang-darwin-c3f5e6079284	0x00308437 js::jit::DoCallFallback(JSContext*, js::jit::BaselineFrame*, js::jit::ICCall_Fallback*, unsigned int, JS::Value*, JS::MutableHandle<JS::Value>) + 1591 (BaselineIC.cpp:5973)

For detailed crash information, see attachment.

This is a fragile testcase. Waldo helped poke at this in-person, and we agree there is no harm setting this s-s for now.
Note that this is a hard-to-reproduce testcase. Happens on Linux as well, but seems to be 32-bit only.

autoBisect shows this is probably related to the following changeset:

The first bad revision is:
changeset:   https://hg.mozilla.org/mozilla-central/rev/95a15364952f
user:        Tooru Fujisawa
date:        Thu Apr 28 23:47:07 2016 +0900
summary:     Bug 1268138 - Call StringSplitString directly for internal use. r=till

Arai-san, is bug 1268138 a likely regressor?
Blocks: 1268138
Flags: needinfo?(arai.unmht)
OS: Mac OS X → All
Whiteboard: [jsbugmon:update] → [jsbugmon:]
We rr'd this a bit until hitting an apparent rr reverse-continue iloop with a hardware watchpoint.  The value of args[0] is JS_ELEMENT_HOLE or however it's spelled.

Working backward via js::DumpBacktrace, it appears this bogo-value is something possibly produced within Intl.js:ResolveLocale.  That method's descendants call StringSplitString and various internal regexp gunk, both of which have been rewritten/substantially modified recently.  Combined with the bisection result, and it being easy to imagine an array resulting from a split erroneously containing holes, it seems likely the problem is with that prior split/regexp work.

I might debug further for the fun/experience of it, ordinarily, but I don't have access to a fully-functional laptop right now running Linux with working rr on it.  So that's as far as I think I can reasonably go now.
> Waldo helped poke at this in-person, and we agree there is no harm setting this s-s for now.

Note that the potential regressing bug 1268138 is also s-s.
Attached file Original testcase (obsolete) —
Erm, wait. This is the original testcase, the one I attached while filing the bug is the slightly reduced one.
Attachment #8751991 - Attachment is obsolete: true
Comment on attachment 8751992 [details]
Original testcase

Oops, seems like I keep uploading dupes. Never mind...
Attachment #8751992 - Attachment is obsolete: true
https://dxr.mozilla.org/mozilla-central/rev/c3f5e6079284a7b7053c41f05d0fe06ff031db03/js/src/builtin/Intl.js#571
> function DefaultLocale() {
>     const runtimeDefaultLocale = RuntimeDefaultLocale();
>     if (runtimeDefaultLocale === localeCache.runtimeDefaultLocale)
>         return localeCache.defaultLocale;

There, `localeCache.defaultLocale` is `undefined` while `localeCache.runtimeDefaultLocale` is a string.
Maybe OOM happens between the following lines, or something breaks the `localeCache.defaultLocale` data at somewhere else?

>     localeCache.runtimeDefaultLocale = runtimeDefaultLocale;
>     localeCache.defaultLocale = locale;

Will continue investigating (adding DumpMessage frequently changes the situation...)
Apparently, swapping those lines fixed the issue, locally.

>     localeCache.runtimeDefaultLocale = runtimeDefaultLocale;
>     localeCache.defaultLocale = locale;
Flags: needinfo?(arai.unmht)
`localeCache.runtimeDefaultLocale` is used for checking whether locale data is cached or not, and `localeCache.defaultLocale` is the actual data, so we should store `localeCache.defaultLocale` first.  So that even if OOM happens while storing data, and localeCache is partially initialized, the data won't be used.

same for `localeCandidateCache`.
Assignee: nobody → arai.unmht
Attachment #8752068 - Flags: review?(jwalden+bmo)
Forgot to mention about the testcase.
I think we should reduce the testcase size if we want to store it in-tree, so currently the patch doesn't have the testcase.
Will try reducing.
So, I think the issue comes from bug 1175347.  and bug 1268138 changed the OOM timing around there.
Comment on attachment 8752068 [details] [diff] [review]
Store locale cache data in more robust order.

Review of attachment 8752068 [details] [diff] [review]:
-----------------------------------------------------------------

Ugh.  Did I really mix up the ordering of these?  :-(  I specifically thought about this issue when writing the regressing changes, yet somehow I still managed to get it wrong.  Bah.
Attachment #8752068 - Flags: review?(jwalden+bmo) → review+
Now reducing the testcase, but surely it's so fragile, and I think, trying to cause OOM at specific line by tweaking un-related code doesn't make sense, as the situation can change shortly, and then the testcase is no more valid.
Do you think it's okay not to attach testcase to the patch?
Flags: needinfo?(jwalden+bmo)
Comment 15 doesn't seem to explain why the test case is so fragile.

The purpose of oomTest() is to test all OOM paths out of a chunk of code. The algorithm is quite simple: it runs the code many times, simulating an OOM on the first allocation, then the second allocation, etc. until no OOM occurs.

If we know what the bug was, why is it hard to produce a solid testcase for it?
Flags: needinfo?(jwalden+bmo) → needinfo?(arai.unmht)
thank you for your explanation :)
I was misunderstanding the oomTest.

actually, the assertion failure frequently disappeared while reducing the testcase, even if I remove/tweak the code that is never executed, and got stuck there.

will try creating testcase from scratch.
now I'm trying with the following code

  oomTest(() => {
    newGlobal().eval(`
      new Intl.Collator("en-US").compare("a", "b");
      new Intl.Collator("de").compare("a", "b");
    `);
  });


as the problematic code is in the initialization part, we need new global per each run.

and added DumpMessage's to DefaultLocale function, to see if it's executed or not for each iteration.

> function DefaultLocale() {
> DumpMessage("DefaultLocale called\n");
>     const runtimeDefaultLocale = RuntimeDefaultLocale();
>     if (runtimeDefaultLocale === localeCache.runtimeDefaultLocale) {
> DumpMessage("cached\n");
>         return localeCache.defaultLocale;
>     }
> DumpMessage("not cached\n");


then, running with |OOM_VERBOSE=1| shows following:

thread 1
  allocation 1
  allocation 2
  ...
  allocation 727
  allocation 728
"DefaultLocale called\n"
"not cached\n"
"DefaultLocale called\n"
"cached\n"
  allocation 729
"DefaultLocale called\n"
"not cached\n"
"DefaultLocale called\n"
"cached\n"
  allocation 730
"DefaultLocale called\n"
"not cached\n"
"DefaultLocale called\n"
"cached\n"
  allocation 731
"DefaultLocale called\n"
"not cached\n"
"DefaultLocale called\n"
"cached\n"
  allocation 732
"DefaultLocale called\n"
"not cached\n"
"DefaultLocale called\n"
"cached\n"
  allocation 733
  allocation 734
  ...
  allocation 779
  allocation 780
"DefaultLocale called\n"
"not cached\n"
"DefaultLocale called\n"
"cached\n"
  allocation 781
  allocation 782
  ...

So, the execution doesn't seem to be deterministic for each iteration, as allocation 733 to 779 don't show the message.

Also, at the first case there the DumpMessage is executed, "allocation 728", message for both following lines are shown ("not cached" and "cached").

>     new Intl.Collator("en-US").compare("a", "b");
>     new Intl.Collator("de").compare("a", "b");

Am I missing something?
was misunderstanding the required situation  :P

so, we should follow the following steps:
  1. call `DefaultLocale`
  2. `localeCache.runtimeDefaultLocale` is undefined
  3. initialize `localeCache.runtimeDefaultLocale`
  4. hit OOM before initializing `localeCache.defaultLocale`
  5. call `DefaultLocale`
  6. `localeCache.runtimeDefaultLocale` is initialized
  7. `localeCache.defaultLocale` is undefined

after hitting OOM, we should use the cache, so newGlobal shouldn't be there.
while testing, I hit strange behavior in oomTest.

I added the following logging, to see the curent counter and maxAllocations value, for each possible OOM point:

> inline bool
> ShouldFailWithOOM()
> {
>   fprintf(stderr, "ShouldFailWithOOM: %d/%d\n", (uint32_t)counter, (uint32_t)maxAllocations);

and also added the following logging, to see the expected range.

> function DefaultLocale() {
> ...
>   DumpMessage("before localeCache.runtimeDefaultLocale");
>     localeCache.runtimeDefaultLocale = runtimeDefaultLocale;
>     localeCache.defaultLocale = locale;
>   DumpMessage("after localeCache.defaultLocale");

and the output is following

>   allocation 767
> ShouldFailWithOOM: 410570/411337
> ShouldFailWithOOM: 410571/411337
> ...
> ShouldFailWithOOM: 411336/411337
> ShouldFailWithOOM: 411337/411337
> ShouldFailWithOOM: 411338/411337
> ...
> ShouldFailWithOOM: 2884446/435512
> "before localeCache.runtimeDefaultLocale"
> ShouldFailWithOOM: 2884447/435512
> ShouldFailWithOOM: 2884448/435512
> ShouldFailWithOOM: 2884449/435512
> ShouldFailWithOOM: 2884450/435512
> ShouldFailWithOOM: 2884451/435512
> ShouldFailWithOOM: 2884452/435512
> ShouldFailWithOOM: 2884453/435512
> ShouldFailWithOOM: 2884454/435512
> ShouldFailWithOOM: 2884455/435512
> ShouldFailWithOOM: 2884456/435512
> ShouldFailWithOOM: 2884457/435512
> ShouldFailWithOOM: 2884458/435512
> "after localeCache.defaultLocale"
> ShouldFailWithOOM: 2884459/435512
> ...
> ShouldFailWithOOM: 4346200/451195
>   allocation 768
> ShouldFailWithOOM: 4346201/4346969

apparently, it overran too much, and the initialization completes on first run.
Waldo, can you help me?

I'm not sure how I can create a reliable testcase here.


unless I'm missing something, the following testcase should hit the expected path.

oomTest(() => {
  new Intl.Collator("en-US").compare("a", "b");
  new Intl.Collator("de").compare("a", "b");
});


the expected path is following:

  in the iteration N
    1. call `DefaultLocale`
    2. `localeCache.runtimeDefaultLocale` is undefined
    3. initialize `localeCache.runtimeDefaultLocale`
    4. hit OOM before initializing `localeCache.defaultLocale`

  in the iteration N+1
    5. call `DefaultLocale`
    6. `localeCache.runtimeDefaultLocale` is initialized
    7. `localeCache.defaultLocale` is undefined
    8. the |args[0].isString()| assertion fails because it's undefined

but as mentioned in comment #20, it overruns before the iteration N, and initialized everything.
do you have any idea to test this more reliably?
Flags: needinfo?(arai.unmht) → needinfo?(jwalden+bmo)
accidentally removed ni? me.
Flags: needinfo?(arai.unmht)
(In reply to Tooru Fujisawa [:arai] from comment #21)
> but as mentioned in comment #20, it overruns before the iteration N, and
> initialized everything.

You mean, the JIT took the operation sequence |x = a; y = b;| and sometimes performs |y = b| before it performs |x = a|?  That is, it's performing operations out of order?  Or something else?
Flags: needinfo?(jwalden+bmo)
>   allocation 767
> ShouldFailWithOOM: 410570/411337

What? This makes no sense. How can maxAllocations be 411337? I would rebuild everything --- and then take a look in the debugger.

> ShouldFailWithOOM: 411336/411337
> ShouldFailWithOOM: 411337/411337
> ShouldFailWithOOM: 411338/411337

This also makes no sense. The program would have to be ignoring an error somewhere.

> ShouldFailWithOOM: 4346200/451195
>   allocation 768

This also makes no sense. OOMTest() should break out of the loop once HadSimulatedOOM() is true.

>   fprintf(stderr, "ShouldFailWithOOM: %d/%d\n", (uint32_t)counter, (uint32_t)maxAllocations);

OK, I think this might be undefined behavior. It couldn't possibly be causing what you're seeing, but it doesn't hurt to eliminate some uncertainty:

    fprintf(stderr, "ShouldFailWithOOM: %lu/%lu\n", (unsigned long)counter, (unsigned long)maxAllocations);
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #23)
> (In reply to Tooru Fujisawa [:arai] from comment #21)
> > but as mentioned in comment #20, it overruns before the iteration N, and
> > initialized everything.
> 
> You mean, the JIT took the operation sequence |x = a; y = b;| and sometimes
> performs |y = b| before it performs |x = a|?  That is, it's performing
> operations out of order?  Or something else?

I meant, oomTest doesn't stop at the expected allocation, and executes both of the following lines at once:
>     localeCache.runtimeDefaultLocale = runtimeDefaultLocale;
>     localeCache.defaultLocale = locale;

once the cache is initialized, the expected situation cannot be reached.


(In reply to Jason Orendorff [:jorendorff] from comment #24)
> >   allocation 767
> > ShouldFailWithOOM: 410570/411337
> 
> What? This makes no sense. How can maxAllocations be 411337? I would rebuild
> everything --- and then take a look in the debugger.
> 
> > ShouldFailWithOOM: 411336/411337
> > ShouldFailWithOOM: 411337/411337
> > ShouldFailWithOOM: 411338/411337
> 
> This also makes no sense. The program would have to be ignoring an error
> somewhere.

Okay, will check where the OOM is ignored.


> > ShouldFailWithOOM: 4346200/451195
> >   allocation 768
> 
> This also makes no sense. OOMTest() should break out of the loop once
> HadSimulatedOOM() is true.
> 
> >   fprintf(stderr, "ShouldFailWithOOM: %d/%d\n", (uint32_t)counter, (uint32_t)maxAllocations);
> 
> OK, I think this might be undefined behavior. It couldn't possibly be
> causing what you're seeing, but it doesn't hurt to eliminate some
> uncertainty:
> 
>     fprintf(stderr, "ShouldFailWithOOM: %lu/%lu\n", (unsigned long)counter,
> (unsigned long)maxAllocations);

Thanks :)
Okay, I think I found the reason.

Here's the call stack for the ignored OOM:

  js::oom::ShouldFailWithOOM() + 175
  js::LifoAlloc::alloc(unsigned long) + 29
  js::SplayTree<js::jit::LiveRange*, js::jit::LiveRange>::Node* js::LifoAlloc::new_<js::SplayTree<js::jit::LiveRange*, js::jit::LiveRange>::Node, js::jit::LiveRange* const&>(js::jit::LiveRange* const&&&) + 35
  js::SplayTree<js::jit::LiveRange*, js::jit::LiveRange>::allocateNode(js::jit::LiveRange* const&) + 104
  js::SplayTree<js::jit::LiveRange*, js::jit::LiveRange>::insert(js::jit::LiveRange* const&) + 47
  js::jit::BacktrackingAllocator::addInitialFixedRange(js::jit::AnyRegister, js::jit::CodePosition, js::jit::CodePosition) + 144
  js::jit::BacktrackingAllocator::buildLivenessInfo() + 1837
  js::jit::BacktrackingAllocator::go() + 76
  js::jit::GenerateLIR(js::jit::MIRGenerator*) + 872
  js::jit::CompileBackEnd(js::jit::MIRGenerator*) + 133
  js::jit::IonCompile(JSContext*, JSScript*, js::jit::BaselineFrame*, unsigned char*, bool, bool, js::jit::OptimizationLevel) + 2846
  js::jit::Compile(JSContext*, JS::Handle<JSScript*>, js::jit::BaselineFrame*, unsigned char*, bool, bool) + 1171
  js::jit::CanEnter(JSContext*, js::RunState&) + 812
  js::RunScript(JSContext*, js::RunState&) + 399
  js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct) + 1211
  InternalCall(JSContext*, js::AnyInvokeArgs const&) + 487
  js::Call(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, js::AnyInvokeArgs const&, JS::MutableHandle<JS::Value>) + 118
  JS_CallFunction(JSContext*, JS::Handle<JSObject*>, JS::Handle<JSFunction*>, JS::HandleValueArray const&, JS::MutableHandle<JS::Value>) + 606
  OOMTest(JSContext*, unsigned int, JS::Value*) + 1347
  js::CallJSNative(JSContext*, bool (*)(JSContext*, unsigned int, JS::Value*), JS::CallArgs const&) + 173
  js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct) + 1006
  InternalCall(JSContext*, js::AnyInvokeArgs const&) + 487
  js::CallFromStack(JSContext*, JS::CallArgs const&) + 29
  js::jit::DoCallFallback(JSContext*, js::jit::BaselineFrame*, js::jit::ICCall_Fallback*, unsigned int, JS::Value*, JS::MutableHandle<JS::Value>) + 1983


the definition of js::LifoAlloc::alloc is following:

https://dxr.mozilla.org/mozilla-central/rev/c4449eab07d39e20ea315603f1b1863eeed7dcfe/js/src/ds/LifoAlloc.h#272
>     MOZ_ALWAYS_INLINE
>     void* alloc(size_t n) {
>         JS_OOM_POSSIBLY_FAIL();
>         return allocImpl(n);
>     }

JS_OOM_POSSIBLY_FAIL is defined as following:

https://dxr.mozilla.org/mozilla-central/rev/c4449eab07d39e20ea315603f1b1863eeed7dcfe/js/public/Utility.h#152
> #  define JS_OOM_POSSIBLY_FAIL()                                              \
>     do {                                                                      \
>         if (js::oom::ShouldFailWithOOM())                                     \
>             return nullptr;                                                   \
>     } while (0)


Then, up to IonCompile, OOM exception is not reported.

[js::LifoAlloc::new_]
https://dxr.mozilla.org/mozilla-central/rev/c4449eab07d39e20ea315603f1b1863eeed7dcfe/js/src/ds/LifoAlloc.h#424
>     JS_DECLARE_NEW_METHODS(new_, alloc, MOZ_ALWAYS_INLINE)

https://dxr.mozilla.org/mozilla-central/rev/c4449eab07d39e20ea315603f1b1863eeed7dcfe/js/public/Utility.h#324
>         void* memory = ALLOCATOR(sizeof(T)); \
>         return MOZ_LIKELY(memory) \
>             ? new(memory) T(mozilla::Forward<Args>(args)...) \
>             : nullptr; \

[js::SplayTree::allocateNode]
https://dxr.mozilla.org/mozilla-central/rev/c4449eab07d39e20ea315603f1b1863eeed7dcfe/js/src/ds/SplayTree.h#197
>         return alloc->new_<Node>(v);

[js::SplayTree::insert]
https://dxr.mozilla.org/mozilla-central/rev/c4449eab07d39e20ea315603f1b1863eeed7dcfe/js/src/ds/SplayTree.h#97
>         Node* element = allocateNode(v);
>         if (!element)
>             return false;

[js::jit::BacktrackingAllocator::addInitialFixedRange]
https://dxr.mozilla.org/mozilla-central/rev/c4449eab07d39e20ea315603f1b1863eeed7dcfe/js/src/jit/BacktrackingAllocator.cpp#463
>     return range && registers[reg.code()].allocations.insert(range);

[js::jit::BacktrackingAllocator::buildLivenessInfo]
https://dxr.mozilla.org/mozilla-central/rev/c4449eab07d39e20ea315603f1b1863eeed7dcfe/js/src/jit/BacktrackingAllocator.cpp#574
>                         if (!addInitialFixedRange(*iter, outputOf(*ins), outputOf(*ins).next()))
>                             return false;

[js::jit::BacktrackingAllocator::go]
https://dxr.mozilla.org/mozilla-central/rev/c4449eab07d39e20ea315603f1b1863eeed7dcfe/js/src/jit/BacktrackingAllocator.cpp#812
>     if (!buildLivenessInfo())
>         return false;

[js::jit::GenerateLIR]
https://dxr.mozilla.org/mozilla-central/rev/c4449eab07d39e20ea315603f1b1863eeed7dcfe/js/src/jit/Ion.cpp#1907
>             if (!regalloc.go())
>                 return nullptr;
https://dxr.mozilla.org/mozilla-central/rev/c4449eab07d39e20ea315603f1b1863eeed7dcfe/js/src/jit/Ion.cpp#1926
>             if (!regalloc.go())
>                 return nullptr;

[js::jit::CompileBackEnd]
https://dxr.mozilla.org/mozilla-central/rev/c4449eab07d39e20ea315603f1b1863eeed7dcfe/js/src/jit/Ion.cpp#1978
>     LIRGraph* lir = GenerateLIR(mir);
>     if (!lir)
>         return nullptr;


and js::jit::IonCompile checks if any exception is pending, for error case.

https://dxr.mozilla.org/mozilla-central/rev/c4449eab07d39e20ea315603f1b1863eeed7dcfe/js/src/jit/Ion.cpp#2248
>         codegen = CompileBackEnd(builder);
>         if (!codegen) {
>             JitSpew(JitSpew_IonAbort, "Failed during back-end compilation.");
>             if (cx->isExceptionPending())
>                 return AbortReason_Error;
>             return AbortReason_Disable;
>         }

So, if the allocation fails in CompileBackEnd, it's not treated as an Error.


and AbortReason_Disable is handled like following:

[js::jit::Compile]
https://dxr.mozilla.org/mozilla-central/rev/c4449eab07d39e20ea315603f1b1863eeed7dcfe/js/src/jit/Ion.cpp#2420
>     AbortReason reason = IonCompile(cx, script, osrFrame, osrPc, constructing,
>                                     recompile, optimizationLevel);
> ...
>     if (reason == AbortReason_Disable)
>         return Method_CantCompile;

[js::jit::CanEnter]
https://dxr.mozilla.org/mozilla-central/rev/c4449eab07d39e20ea315603f1b1863eeed7dcfe/js/src/jit/Ion.cpp#2511
>     MethodStatus status = Compile(cx, rscript, nullptr, nullptr, constructing);
>     if (status != Method_Compiled) {
>         if (status == Method_CantCompile)
>             ForbidCompilation(cx, rscript);
>         return status;

[js::RunScript]
https://dxr.mozilla.org/mozilla-central/rev/c4449eab07d39e20ea315603f1b1863eeed7dcfe/js/src/vm/Interpreter.cpp#402
>     if (jit::IsIonEnabled(cx)) {
>         jit::MethodStatus status = jit::CanEnter(cx, state);
>         if (status == jit::Method_Error)
>             return false;
>         if (status == jit::Method_Compiled) {
>             jit::JitExecStatus status = jit::IonCannon(cx, state);
>             return !IsErrorStatus(status);
>         }
>     }


It just fails to enter Ion and continues Interpreter execution.
That's the reason why it overruns in oomTest.
jandem, I think this is a similar case as bug 1263558's Part 0.1 and 0.2:
  https://hg.mozilla.org/mozilla-central/rev/611130fe9f93
  https://hg.mozilla.org/mozilla-central/rev/344a4bcc9015

And I have 2 questions:
  1. should we return AbortReason_Error from js::jit::IonCompile for this specific case (comment #26) ?  (I think we should!)
  2. which functions are responsible for propagating/handling OOM information?
Flags: needinfo?(arai.unmht) → needinfo?(jdemooij)
(In reply to Tooru Fujisawa [:arai] from comment #27)
>   1. should we return AbortReason_Error from js::jit::IonCompile for this
> specific case (comment #26) ?  (I think we should!)

I'm not sure - Ion compilation is not critical so if we OOM it doesn't seem unreasonable to abort compilation and continue execution.

Also, if we do change this, won't we run into the same issue elsewhere? We have other places where we recover from or ignore OOM...
Flags: needinfo?(jdemooij)
Thanks :)

okay, so, this specific issue cannot be tested with oomTest in simple script, because of the following two conflicting requirements:
  a) should cause OOM at specific point, in Ion execution
  b) should not execute whole script

(a) comes from, the possibly OOM path is not taken in Interpreter execution for the specific code:
>     localeCache.runtimeDefaultLocale = runtimeDefaultLocale;
>     localeCache.defaultLocale = locale;

as the properties already exist at the declaration:
> var localeCache = {
>     runtimeDefaultLocale: undefined,
>     defaultLocale: undefined,
> };


(b) comes from, we want to cause OOM after initializing `localeCache.runtimeDefaultLocale`, but we should never initialize `localeCache.defaultLocale`.


Still not sure why the testcase in comment #2 hit this, avoiding the issue between oomTest and Ion compilation.... :/
I guess I need to check the comment #2 testcase again.
the description about (b) wasn't accurate.

as it's an initialization function, after executing whole script, the initialization path won't be executed on next call.
I was misunderstanding the situation.

Looks like, the OOM happens inside Baseline execution, not Ion.
Will look into baseline.
Okay, so, --no-fpu disables SSE only on 32bit, and in that case Ion is disabled, as jitSupportsFloatingPoint is false.

> static inline bool
> IsIonEnabled(JSContext* cx)
> {
>     // The ARM64 Ion engine is not yet implemented.
> #if defined(JS_CODEGEN_NONE) || defined(JS_CODEGEN_ARM64)
>     return false;
> #else
>     return cx->runtime()->options().ion() &&
>            cx->runtime()->options().baseline() &&
>            cx->runtime()->jitSupportsFloatingPoint;
> #endif
> }

if Ion is disabled, following simple testcase hits OOM inside jit::AttachBaselineCacheIRStub, and execution doesn't stop.

  oomTest(String.prototype.localeCompare);

again, we face the issue related to JIT compilation and oomTest.
The requirements are following:
  * Execute the script in baseline
  * do not ignore all OOM, at least before hitting the property initialization code
    (that means, hit all OOM in critical places, not inside JIT compilation or optimization)
  * hit OOM between the following 2 lines
>     localeCache.runtimeDefaultLocale = runtimeDefaultLocale;
>     localeCache.defaultLocale = locale;

will check how the testcase in comment #2 avoids the OOM in optimization path.
(maybe, the optimization path is not taken because of other code executed before it, or memory allocation is not performed there somehow?)
here's the last 3 backtraces for crash case

[allocation 4127]
js::oom::ShouldFailWithOOM() [clone .part.27] at /usr/include/i386-linux-gnu/bits/stdio2.h:98
mozilla::Vector<unsigned char, 256u, js::SystemAllocPolicy>::reserve(unsigned int) at /home/osboxes/projects/mozilla-central/obj-sm/dist/include/mozilla/Vector.h:993
js::jit::TypedRegisterSet<js::jit::Register>::getAny() const at /home/osboxes/projects/mozilla-central/js/src/jit/RegisterSets.h:393
js::jit::BaselineCompilerShared::callVM(js::jit::VMFunction const&, js::jit::BaselineCompilerShared::CallVMPhase) at /home/osboxes/projects/mozilla-central/js/src/jit/shared/BaselineCompiler-shared.cpp:140
js::jit::BaselineCompiler::emit_JSOP_DEBUGCHECKSELFHOSTED() at /home/osboxes/projects/mozilla-central/js/src/jit/BaselineCompiler.cpp:4322
js::jit::BaselineCompiler::emitBody() at /home/osboxes/projects/mozilla-central/js/src/jit/BaselineCompiler.cpp:1007
js::jit::BaselineCompiler::compile() at /home/osboxes/projects/mozilla-central/js/src/jit/BaselineCompiler.cpp:116
js::jit::BaselineCompile(JSContext*, JSScript*, bool) at /home/osboxes/projects/mozilla-central/js/src/jit/BaselineJIT.cpp:293
CanEnterBaselineJIT(JSContext*, JS::Handle<JSScript*>, js::InterpreterFrame*) at /home/osboxes/projects/mozilla-central/js/src/jit/BaselineJIT.cpp:330 (discriminator 6)
js::jit::CanEnterBaselineMethod(JSContext*, js::RunState&) at /home/osboxes/projects/mozilla-central/js/src/jit/BaselineJIT.cpp:391
js::RunScript(JSContext*, js::RunState&) at /home/osboxes/projects/mozilla-central/js/src/vm/Interpreter.cpp:413
js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct) at /home/osboxes/projects/mozilla-central/js/src/vm/Interpreter.cpp:498
InternalCall(JSContext*, js::AnyInvokeArgs const&) at /home/osboxes/projects/mozilla-central/js/src/vm/Interpreter.cpp:526
js::jit::DoCallFallback(JSContext*, js::jit::BaselineFrame*, js::jit::ICCall_Fallback*, unsigned int, JS::Value*, JS::MutableHandle<JS::Value>) at /home/osboxes/projects/mozilla-central/js/src/jit/BaselineIC.cpp:5973

[allocation 4128]
js::oom::ShouldFailWithOOM() [clone .part.27] at /usr/include/i386-linux-gnu/bits/stdio2.h:98
bool js::gc::GCRuntime::checkAllocatorState<(js::AllowGC)1>(JSContext*, js::gc::AllocKind) at /home/osboxes/projects/mozilla-central/js/src/gc/Allocator.cpp:74
js::Shape* js::Allocate<js::Shape, (js::AllowGC)1>(js::ExclusiveContext*) at /home/osboxes/projects/mozilla-central/js/src/gc/Allocator.cpp:213
js::Shape::new_(js::ExclusiveContext*, JS::Handle<js::StackShape>, unsigned int) at /home/osboxes/projects/mozilla-central/js/src/vm/Shape-inl.h:87
js::NativeObject::getChildProperty(js::ExclusiveContext*, JS::Handle<js::NativeObject*>, JS::Handle<js::Shape*>, JS::MutableHandle<js::StackShape>) at /home/osboxes/projects/mozilla-central/js/src/vm/Shape.cpp:447
JS::Rooted<js::Shape*>::set(js::Shape*) at /home/osboxes/projects/mozilla-central/obj-sm/dist/include/js/RootingAPI.h:678
js::NativeObject::putProperty(js::ExclusiveContext*, JS::Handle<js::NativeObject*>, JS::Handle<jsid>, bool (*)(JSContext*, JS::Handle<JSObject*>, JS::Handle<jsid>, JS::MutableHandle<JS::Value>), bool (*)(JSContext*, JS::Handle<JSObject*>, JS::Handle<jsid>, JS::MutableHandle<JS::Value>, JS::ObjectOpResult&), unsigned int, unsigned int, unsigned int) at /home/osboxes/projects/mozilla-central/js/src/vm/Shape.cpp:769
mozilla::Array<JS::Rooted<void*>*, 12u>::operator[](unsigned int) at /home/osboxes/projects/mozilla-central/obj-sm/dist/include/mozilla/Array.h:29
js::NativeDefineProperty(js::ExclusiveContext*, JS::Handle<js::NativeObject*>, JS::Handle<jsid>, JS::Handle<JS::PropertyDescriptor>, JS::ObjectOpResult&) at /home/osboxes/projects/mozilla-central/js/src/vm/NativeObject.cpp:1393
js::DefineProperty(js::ExclusiveContext*, JS::Handle<JSObject*>, JS::Handle<jsid>, JS::Handle<JS::Value>, bool (*)(JSContext*, JS::Handle<JSObject*>, JS::Handle<jsid>, JS::MutableHandle<JS::Value>), bool (*)(JSContext*, JS::Handle<JSObject*>, JS::Handle<jsid>, JS::MutableHandle<JS::Value>, JS::ObjectOpResult&), unsigned int, JS::ObjectOpResult&) at /home/osboxes/projects/mozilla-central/js/src/jsobj.cpp:2714
js::DefineProperty(js::ExclusiveContext*, JS::Handle<JSObject*>, JS::Handle<jsid>, JS::Handle<JS::Value>, bool (*)(JSContext*, JS::Handle<JSObject*>, JS::Handle<jsid>, JS::MutableHandle<JS::Value>), bool (*)(JSContext*, JS::Handle<JSObject*>, JS::Handle<jsid>, JS::MutableHandle<JS::Value>, JS::ObjectOpResult&), unsigned int) at /home/osboxes/projects/mozilla-central/js/src/jsobj.cpp:2745
js::DefineProperty(js::ExclusiveContext*, JS::Handle<JSObject*>, js::PropertyName*, JS::Handle<JS::Value>, bool (*)(JSContext*, JS::Handle<JSObject*>, JS::Handle<jsid>, JS::MutableHandle<JS::Value>), bool (*)(JSContext*, JS::Handle<JSObject*>, JS::Handle<jsid>, JS::MutableHandle<JS::Value>, JS::ObjectOpResult&), unsigned int) at /home/osboxes/projects/mozilla-central/js/src/jsobj.cpp:2760
intl_availableLocales(JSContext*, int (*)(), char const* (*)(int), JS::MutableHandle<JS::Value>) at /home/osboxes/projects/mozilla-central/js/src/builtin/Intl.cpp:527

[allocation 4129]
js::oom::ShouldFailWithOOM() [clone .part.27] at /usr/include/i386-linux-gnu/bits/stdio2.h:98
js::LifoAlloc::alloc(unsigned int) at /home/osboxes/projects/mozilla-central/js/src/ds/LifoAlloc.h:273
js::jit::DoSetPropFallback(JSContext*, js::jit::BaselineFrame*, js::jit::ICSetProp_Fallback*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, JS::MutableHandle<JS::Value>) at /home/osboxes/projects/mozilla-central/js/src/jit/BaselineIC.cpp:4310

after that, it hits assertion failure.

Assertion failure: args[0].isString(), at /home/osboxes/projects/mozilla-central/js/src/builtin/Intl.cpp:835


So, even with reproduced case, OOM happens in almost random place. and it's not exhaustive.
So, it won't be reasonable to search minimal testcase, and it's better landing the patch without testcase.
maybe we should solve those situation and make oomTest more reliable, but it should be addressed in separated bug.

jorendorff, can I have your opinion?
Flags: needinfo?(jorendorff)
You're right. Land it.
Flags: needinfo?(jorendorff)
https://hg.mozilla.org/mozilla-central/rev/31d62c0b92a9
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla49
Group: javascript-core-security → core-security-release
Why did this land without sec-approval? Was 48 unaffected?

Otherwise, sec-approval? should have been set and the template questions answered before approval was given.

https://wiki.mozilla.org/Security/Bug_Approval_Process

We still need the questions answered assuming that this went in when it shouldn't have. Can you set sec-approval? on the patch and answer the questions?
Flags: needinfo?(arai.unmht)
The flipped ordering of lines in the JS dates back to bug 1175347, pegged to Mozilla 42.  The *consequences* of that flipped ordering, in case of OOM, depend on exactly where an expected-string goes but instead computes to |undefined|.

Exactly what those consequences are, would require knowing exactly how these locale values are consumed.  That question requires understanding of the history of this code, in ways that I'm not prepared to elucidate with all that much confidence -- it's all quasi-ancient history, as far as my memory goes.

The problem and fix are simple.  Given that, I don't think it's worth the trouble to re-understand hg logs enough to conclude downstream use of an unexpected |undefined| instead of a string is safe or unsafe.  We should just fix this back to 42 and move on with life.
Sorry, my understanding was that this flaw was from bug 1268138.
The underlying issue was from bug 1175347, but it was not possible to hit the flaw before bug 1268138.

The patch in bug 1268138 was landed to 49 (2016-04-30) and uplifted to 48 (2016-05-03), but it was soon backed out from 48 by bug 1265307 (2016-05-06).
So, at the point of this bug reported (2016-05-13), the affected range of bug 1268138 was nightly-only.

Will request sec-approval shortly.
Comment on attachment 8752068 [details] [diff] [review]
Store locale cache data in more robust order.

[Security approval request comment]
> How easily could an exploit be constructed based on the patch?

It's hard to hit this from web content (even with privileged code).

There's one testcase that could hit this, but it requires testing-only functions/flags that needs js shell or chrome privilege to force JIT compilation and OOM, and requires very long most-unrelated code to setup the GC status.

> Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?

Not much.
It mentions robustness, but doesn't say what could happen with it, nor how to hit it.

> Which older supported branches are affected by this flaw?

The issue was introduced by bug 1175347 in 42 branch,
and exposed by bug 1268138 in 49 branch.

> If not all supported branches, which bug introduced the flaw?

bug 1175347 and bug 1268138.

> Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?

It can be prepared easily.

> How likely is this patch to cause regressions; how much testing does it need?

Less likely.
It just flipped the 2 lines of JS code in initialization code.
The change have no effect on Interepreter execution, and initialization code won't be executed in JIT unless some other thing goes wrong.
Flags: needinfo?(arai.unmht)
Attachment #8752068 - Flags: sec-approval?
So this is effectively 49 and 50 only, based on comment 41?
Flags: needinfo?(arai.unmht)
(In reply to Al Billings [:abillings] from comment #43)
> So this is effectively 49 and 50 only, based on comment 41?

yes
Flags: needinfo?(arai.unmht)
Comment on attachment 8752068 [details] [diff] [review]
Store locale cache data in more robust order.

Let's get a patch nominated for aurora too so we can fix it both affected places.
Attachment #8752068 - Flags: sec-approval? → sec-approval+
thanks :)
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: