Closed Bug 1522294 Opened 9 months ago Closed 8 months ago

Assertion failure: regionStart && OffsetFromAligned(regionStart, alignment) != 0, at js/src/gc/Memory.cpp:662

Categories

(Core :: JavaScript Engine, defect, critical)

x86
Linux
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla67
Tracking Status
firefox-esr60 --- unaffected
firefox65 --- unaffected
firefox66 --- fixed
firefox67 --- fixed

People

(Reporter: decoder, Assigned: ehoogeveen)

References

(Blocks 1 open bug)

Details

(4 keywords, Whiteboard: [jsbugmon:update])

Attachments

(1 file, 1 obsolete file)

The following testcase crashes on mozilla-central revision 666abafd77b1 (build with --enable-posix-nspr-emulation --enable-valgrind --enable-gczeal --disable-tests --disable-profiling --enable-debug --without-intl-api --enable-optimize --target=i686-pc-linux-gnu, run with --fuzzing-safe --cpu-count=2 --ion-offthread-compile=off):

evaluate(`
  function testCALLELEM() {
    function f() {}
    evalInWorker("newGlobal().offThreadCompileScript('{}');");
    testCALLELEM();
  }
  testCALLELEM()
`);

Backtrace:

received signal SIGSEGV, Segmentation fault.
#0 js::gc::TryToAlignChunk<>(void **, void **, size_t, size_t) (aRegion=aRegion@entry=0xa2bcef8, aRetainedRegion=aRetainedRegion@entry=0xa2bcf04, length=length@entry=1048576, alignment=1048576) at js/src/gc/Memory.cpp:662
#1 0x56e82aa8 in js::gc::MapAlignedPagesLastDitch (length=length@entry=1048576, alignment=alignment@entry=1048576) at js/src/gc/Memory.cpp:592
#2 0x56e832c4 in js::gc::MapAlignedPages (length=1048576, alignment=1048576) at js/src/gc/Memory.cpp:447
#3 0x56e03f91 in js::gc::Chunk::allocate (rt=0xf44d8000) at js/src/gc/Allocator.cpp:721
#4 0x56e0e8ca in js::gc::GCRuntime::getOrAllocChunk (this=0xf44d83c8, lock=...) at js/src/gc/Allocator.cpp:653
#5 0x56e0e98f in js::gc::GCRuntime::pickChunk (this=0xf44d83c8, lock=...) at js/src/gc/Allocator.cpp:678
#6 0x56e17015 in js::gc::ArenaLists::refillFreeListAndAllocate (this=0xcce72060, freeLists=..., thingKind=js::gc::AllocKind::JITCODE, checkThresholds=js::gc::ShouldCheckThresholds::CheckThresholds) at js/src/gc/Allocator.cpp:477
#7 0x56e17273 in js::gc::GCRuntime::refillFreeListFromMainThread (cx=0xcce71800, thingKind=js::gc::AllocKind::JITCODE) at js/src/gc/Allocator.cpp:422
#8 0x56e6597b in js::gc::GCRuntime::tryNewTenuredThing<js::jit::JitCode, (js::AllowGC)0> (thingSize=32, kind=js::gc::AllocKind::JITCODE, cx=0xcce71800) at js/src/gc/Allocator.cpp:271
#9 js::Allocate<js::jit::JitCode, (js::AllowGC)0> (cx=0xcce71800) at js/src/gc/Allocator.cpp:252
#10 0x570e6482 in js::jit::JitCode::New<(js::AllowGC)0> (cx=0xcce71800, code=0x38268010 "", bufferSize=60600, headerSize=16, pool=0xec4d3070, kind=js::jit::CodeKind::Other) at js/src/jit/Ion.cpp:653
#11 0x5712876c in js::jit::Linker::newCode (this=0xa2bd2bc, cx=0xcce71800, kind=js::jit::CodeKind::Other) at js/src/jit/Linker.cpp:54
#12 0x570dcd44 in js::jit::JitRuntime::initialize (this=<optimized out>, cx=<optimized out>) at js/src/jit/Ion.cpp:306
#13 0x56a7eed7 in JSRuntime::createJitRuntime (this=0xf44d8000, cx=0xcce71800) at js/src/vm/Realm.cpp:146
#14 0x56c761aa in JS::InitSelfHostedCode (cx=0xcce71800) at js/src/jsapi.cpp:443
#15 0x567594d0 in WorkerMain (input=<optimized out>) at js/src/shell/js.cpp:3960
[...]
#19 0xf7cc441e in clone () from /lib32/libc.so.6
eax 0x57bffa54 1472199252
ebx 0xa2bcf04 170643204
ecx 0xf7d90864 -136771484
edx 0x57781f64 1467490148
esi 0x100000 1048576
edi 0x57bfeff4 1472196596
ebp 0xa2bcec8 170643144
esp 0xa2bce90 170643088
eip 0x56e819d4 <js::gc::TryToAlignChunk<>(void **, void **, size_t, size_t)+388>
=> 0x56e819d4 <js::gc::TryToAlignChunk<>(void **, void **, size_t, size_t)+388>: movl $0x0,0x0
0x56e819de <js::gc::TryToAlignChunk<>(void **, void **, size_t, size_t)+398>: ud2

This could be shell-only as it seems to be related to over-recursion with the worker implementation in the shell.

Steve, is this regression bug something you could quickly look at some point soon and investigate. Or if you don't have time could you help suggest someone else on our team I could have look at this.

Flags: needinfo?(sphink)
Flags: needinfo?(sphink)

Hmm, I'm not sure what the deal is with this one but I wonder if it's related to bug 1520496. I may have introduced a bug in TryToAlignChunk() that's either rare or being masked by using the new allocator by default (or both). I'll look into this tomorrow along with bug 1520496.

I haven't caught this in gdb or rr. Whatever it's doing mangles things enough that it crashes both, in different ways. I can get a core file, but it doesn't have symbols (or a valid stack).

It's something with stack exhaustion or something. Running it starts and stops a couple thousand threads. I added a printout just before the assert, which gave regionStart=0xdeb00000, alignment=1048576, offset=0. So an already-aligned chunk is slipping in?

The only way I see of that happening is if a previous call to TryToAlignChunk set region to an aligned value but returned false.

Oh! ehoogeveen, would this do it? TryToAlignChunk runs, and maps another unaligned region. It sets result to false. AlwaysGetNew is the default, true, so it does another MapMemory. Say this returns an aligned region. Then wouldn't you end up passing back an aligned pointer in &region, but returning false and trying again, hitting this assert?

Simply re-setting 'result' based on the alignment doesn't work; it hits MOZ_ASSERT(!tempMaps[attempt]) (it doesn't think we should have filled in tempMaps[attempt] if we succeeded.) If I "fix" that (with if (tempMaps[attempt]) attempt++; and commenting out the assert), it "works". But I'll let you figure out how to do it the proper way.

Blocks: 1502733
Whiteboard: [jsbugmon:update,bisect] → [jsbugmon:update]

I don't think comment 4 is right, the testcase may have been intermittent or something.

Thanks for the analysis, Steve - you're right, we weren't handling naturally aligned chunks properly. While testing this I also noticed that TestMapAlignedPagesLastDitch() was still disabled on 64-bit platforms. That probably caused the failures in bug 1520496, though I still want to look into why we're falling back to the old allocator at all.

PS: Sorry for sitting on this for so long, I've had some health issues recently so I haven't had much energy for things outside of my job.

Assignee: nobody → emanuel.hoogeveen
Attachment #9044723 - Flags: review?(sphink)
Status: NEW → ASSIGNED
Blocks: 1520496

I realized I could extend the testGCAllocator jsapi-test to catch this bug, so I did. MapAlignedPages() was dodging this bug by ignoring the return value of TryToAlignChunk(), but that was suboptimal so I changed it (now two tests in testGCAllocator fail without the change in TryToAlignChunk()).

I also added a few more (debug only) assertions to make sure TryToAlignChunk() works correctly, and added some much needed comments to MapAlignedPages() while I was there.

Attachment #9044723 - Attachment is obsolete: true
Attachment #9044723 - Flags: review?(sphink)
Attachment #9044774 - Flags: review?(sphink)
Comment on attachment 9044774 [details] [diff] [review]
v2 - Do the right thing if we happen to get an aligned region in TryToAlignChunk.

Review of attachment 9044774 [details] [diff] [review]:
-----------------------------------------------------------------

Ah, that nicely fixes it. Thank you very much for your work here.
Attachment #9044774 - Flags: review?(sphink) → review+

Thanks for the review!

Here's a try run with a change on top to disable the new allocator (to force testing the old one): https://treeherder.mozilla.org/#/jobs?repo=try&revision=cdc39223bfc2529861e090486cd6448be06f5e26

And a regular try run with just this change: https://treeherder.mozilla.org/#/jobs?repo=try&revision=28d36a68c4a4c2de1af2a447631ccc556a7913d8

Both are green, so I think this is ready to go :)

Keywords: checkin-needed

Pushed by csabou@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/ee8c2687d7f5
Do the right thing if we happen to get an aligned region in TryToAlignChunk. r=sfink

Keywords: checkin-needed
Status: ASSIGNED → RESOLVED
Closed: 8 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla67

Is this something we should consider backporting to Beta or can it ride the trains?

Flags: needinfo?(emanuel.hoogeveen)
Flags: in-testsuite+

Comment on attachment 9044774 [details] [diff] [review]
v2 - Do the right thing if we happen to get an aligned region in TryToAlignChunk.

Beta/Release Uplift Approval Request

  • Feature/Bug causing the regression: Bug 1502733
  • User impact if declined: This bug partially breaks our 'last ditch' allocator (except on Windows, which is unaffected) which is used in low memory situations. As a result 32-bit platforms and devices with low physical memory could run out of memory more quickly, though it's hard to judge the real world effect of this change. This patch also fixes the intermittent test failure tracked by bug 1520496, and might reduce noise for fuzzers.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Medium
  • Why is the change risky/not risky? (and alternatives if risky): This is not a one-line fix, but the changes are contained to two functions that are heavily tested by the JS engine. The patch includes a regression test for the broken behavior identified in this bug.
  • String changes made/needed:
Flags: needinfo?(emanuel.hoogeveen)
Attachment #9044774 - Flags: approval-mozilla-beta?

Hi Ryan, thanks for the reminder. I wasn't sure whether to nominate this patch as it's getting pretty late in the cycle, but I think the included fix for bug 1520496 makes it worth taking.

Comment on attachment 9044774 [details] [diff] [review]
v2 - Do the right thing if we happen to get an aligned region in TryToAlignChunk.

Potential improvements for OOM crashes, beta fuzzing. Has test coverage.
Let's uplift for beta 12.
Attachment #9044774 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.