Closed Bug 840242 Opened 11 years ago Closed 10 years ago

GC: only decommit pages when the page size matches the arena size at runtime

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla24

People

(Reporter: terrence, Assigned: terrence)

References

Details

Attachments

(2 files, 3 obsolete files)

On at least PPC, the dynamic nature of the page size ensures that we will never be able to perfectly pin the arena size to the page size. We should just cache the page size in js_InitGC and check if it is the same as the arena size before entering the decommit loop. This should solve our intermittent build problems on various platforms and greatly simplify the arena size definition for effectively zero runtime cost.
Attached patch v0 (obsolete) — Splinter Review
Attachment #712614 - Flags: review?(wmccloskey)
Attachment #712614 - Flags: feedback?(landry)
Atm m-c doesnt build for me on ppc32/openbsd, but that looks unrelated to that diff :

 5:03.08 /home/landry/src/mozilla-central/js/src/jsatom.cpp: In function 'bool js::XDRAtom(js::XDRState<mode>*, js::MutableHandleAtom)':
 5:03.08 /home/landry/src/mozilla-central/js/src/jsatom.cpp:519:42: error: no matching function for call to 'AtomizeChars(JSContext*&, jschar*&, uint32_t&)'
 5:03.08 /home/landry/src/mozilla-central/js/src/jsatom.cpp:519:42: note: candidate is:
 5:03.09 /home/landry/src/mozilla-central/js/src/jsatom.cpp:406:1: note: template<js::AllowGC allowGC> JSAtom* js::AtomizeChars(JSContext*, const jschar*, size_t, js::InternBehavior)

does it ring a bell to anyone ? that extra js::ib arg doesnt seem present in any of the calls to atomizechars.
Comment on attachment 712614 [details] [diff] [review]
v0

Fwiw with your patch (and the one from #840611) the js shell builds fine on openbsd/ppc32, and only one test fails :

TEST-UNEXPECTED-FAIL | testDebugger_throwHook | CHECK failed: called

i'd have to double check but i'm pretty sure this is unrelated to the diff.
Attachment #712614 - Flags: feedback?(landry) → feedback+
Thanks for testing! The Debugger failure does seem unrelated. If you attach a backtrace I'll take a look.
I backported Terrence's patch to Mozilla18 and tested it on PPC64. It builds fine both 32 and 64 bits. However, after browsing a few web pages, the JS GC Helper thread gets stuck in an infinite loop in js::DecommitArenasFromAvailableList(). Here is the backtrace:


#0  js::DecommitArenasFromAvailableList (rt=rt@entry=0x3fffa1430000, availableListHeadp=availableListHeadp@entry=0x3fffa1430138)
    at /usr/src/debug/xulrunner-18.0.2/mozilla-release/js/src/jsgc.cpp:2816
#1  0x00003fffb23f49cc in DecommitArenas (rt=0x3fffa1430000) at /usr/src/debug/xulrunner-18.0.2/mozilla-release/js/src/jsgc.cpp:2844
#2  ExpireChunksAndArenas (shouldShrink=true, rt=0x3fffa1430000) at /usr/src/debug/xulrunner-18.0.2/mozilla-release/js/src/jsgc.cpp:2858
#3  js::GCHelperThread::doSweep (this=this@entry=0x3fffa1430bc0) at /usr/src/debug/xulrunner-18.0.2/mozilla-release/js/src/jsgc.cpp:3168
#4  0x00003fffb23f4bdc in js::GCHelperThread::threadLoop (this=0x3fffa1430bc0) at /usr/src/debug/xulrunner-18.0.2/mozilla-release/js/src/jsgc.cpp:3001
#5  0x00003fffb22ad3f8 in ?? () from /lib64/libnspr4.so
#6  0x00003fffb2e9c604 in start_thread (arg=0x3fffa13ff1c0) at pthread_create.c:310
#7  0x00003fffb2aaedb4 in .__clone () from /lib64/libc.so.6
Can't say if it's the same crash - here (openbsd/ppc32) after building m-c with the patch i've been able to load maps.google.fr, the nightly startup page and this bug page - firefox crashed with an OOM. I'm not able to load the core in gdb since libxul itself is >960mo and i only have 1gb physmem on this macmini, but i dont think the OOM is related to the patch itself.
Will test next patch :)
Comment on attachment 712614 [details] [diff] [review]
v0

Review of attachment 712614 [details] [diff] [review]:
-----------------------------------------------------------------

Applying my suggestions Firefox works fine on PPC64, otherwise it freezes after a while (JS GC Helper thread using 100% CPU stuck in DecommitArenasFromAvailableList()).

::: js/src/gc/Memory.cpp
@@ -97,1 @@
>          return false;

Shouldn't it return true (success) when decommit is disabled?

@@ -371,1 @@
>          return false;

Same here, return true when decommit is disabled.
(In reply to Gustavo Luiz Duarte from comment #7)
> Comment on attachment 712614 [details] [diff] [review]
> v0
> 
> Review of attachment 712614 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Applying my suggestions Firefox works fine on PPC64, otherwise it freezes
> after a while (JS GC Helper thread using 100% CPU stuck in
> DecommitArenasFromAvailableList()).
> 
> ::: js/src/gc/Memory.cpp
> @@ -97,1 @@
> >          return false;
> 
> Shouldn't it return true (success) when decommit is disabled?

Well, no, not really: decommit is allowed to fail, in which case we do what we did before we implemented decommit. If we "succeed" the decommit here then the memory manager will treat the pages as decommitted when they are not. I think in practice this will just lead to the wrong numbers in about:memory -- not the end of the world. More importantly, this probably reflects a real problem with our memory management that we need to track down.
Comment on attachment 712614 [details] [diff] [review]
v0

Review of attachment 712614 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/gc/Memory.h
@@ +15,5 @@
>  namespace gc {
>  
>  // Sanity check that our compiled configuration matches the currently running
>  // instance and initialize any runtime data needed for allocation.
> +void InitMemorySubsystem(JSRuntime *rt);

The return type should go on a separate line in this function and the ones below. Can you fix?

::: js/src/jsgc.cpp
@@ +471,5 @@
>      }
>  }
>  
>  static inline Chunk *
> +AllocChunk(JSRuntime *rt) {

Can you move the brace to its own line?

@@ +477,4 @@
>  }
>  
>  static inline void
> +FreeChunk(JSRuntime *rt, Chunk *p) {

Same here.

@@ +585,5 @@
>  
>  /* static */ Chunk *
>  Chunk::allocate(JSRuntime *rt)
>  {
> +    Chunk *chunk = static_cast<Chunk *>(AllocChunk(rt));

The cast seems unnecessary. Please remove.

@@ +598,5 @@
>      // This leaks all of these poisoned pointers. It would be better if they
>      // were marked as uncommitted, but it's a little complicated to avoid
>      // clobbering pre-existing unrelated mappings.
>      while (IsPoisonedPtr(chunk))
> +        chunk = static_cast<Chunk *>(AllocChunk(rt));

Same here.

@@ +731,5 @@
>      return -1;
>  }
>  
>  ArenaHeader *
> +Chunk::fetchNextDecommittedArena(JSRuntime *rt)

Now that chunks store the runtime, you don't need to add this extra parameter.

@@ +2117,5 @@
>              if (ok) {
>                  ++chunk->info.numArenasFree;
>                  chunk->decommittedArenas.set(arenaIndex);
>              } else {
>                  chunk->addArenaToFreeList(rt, aheader);

This code is inside a loop that does:
  while (chunk->info.numArenasFreeCommitted != 0) {
That's why we never terminate if the decommit fails.

I think we should change the statement:
  if (rt->gcChunkAllocationSinceLastGC) {
to this:
  if (rt->gcChunkAllocationSinceLastGC || !ok) {
That should fix the infinite looping.

In addition, please add a check to the top of this function so that we return immediately if decommitting is disabled.
Attachment #712614 - Flags: review?(wmccloskey)
I like this!
Note that everything (at least _LP64) with ArenaShift != 12 will imediately abort at runtime with the current code (offset of bitmap in struct Arena does not match the precalculated magic).
Luke, I'm flagging you for review on this because AsmJS recently added a dependency on the fixed PageSize that this patch removes. I've moved the current definition to AsmJS.h as AsmJSPageSize and fixed it to 4096. I assume we don't compile asm.js on any of the platforms affected by this bug anyway, so I think this will probably be fine.
Attachment #712614 - Attachment is obsolete: true
Attachment #756805 - Flags: review?(luke)
Using att #756805 (and removing word from BE64 in Value.h, see last comment in bug 618485) allows me to build the js engine on OpenBSD/sparc64. As stated by martin comment 10, jsapi-tests immediatly blows at runtime with a bus error, but it's an improvement over a non-building tree.
Landry: you missunderstood my comment. With this change, it does not blow up any more, as ArenaShift is fixed to 12 and makes the precalculated offsets match the real layout.
The bus errors you see are probably due to some of the other tickets I recently filed which you do not have in your tree yet.
Right. With that word struct member removed from Value.h on line 324, att #756805 from that bug & patches from bugs # 870325 and 871444, firefox from trunk _runs_ on OpenBSD/sparc64.

http://rhaalovely.net/~landry/shared/firefox-24.0a1-sparc64-2.png
http://rhaalovely.net/~landry/shared/firefox-24.0a1-sparc64.png

User Agent: Mozilla/5.0 (X11; OpenBSD sparc64; rv:24.0) Gecko/20130602 Firefox/24.0

It even loads google maps without crashing, and besides the favicons & the globe on about:home having this weird blueish colors, it seems to be stable. Just awesome.

jsapi-tests also succeeds :
Passed: ran 113 tests.
Could we instead have JS_HAS_FIXED_PAGED_SIZE, put PageSize inside #ifdef JS_HAS_FIXED_PAGE_SIZE, use this to disable asm.js compilation (easy, at the head of js::CompileAsmJS) as well as GC decommitting?
Note that there are architectures where we have a fixed page size but it is != 4k. I would love to get asm.js going there as well, but it is not a high priority target for me right now - enough other issues to resolve first.
Bigger or smaller than 4k?
sparc64 uses 8k (the hardware offers various larger sizes as well, but they are typically used for mapping framebuffers or the like, but not arbitrary memory).

All that is needed, most likely, is to fix some offset calculations, but I couldn't see how they all play together, and the patch from Terence was the quickest way out for now.
Same for us on OpenBSD, sparc64 uses 8k, hppa, amd64, ppc(32) & i386 uses 4k..
It is not clear to me what "fixed page size" means? On Fedora we have the page size fixed on 64k for ppc64. Event though we don't ship 4k page size kernels for ppc64, the user can always build a custom 4k page size kernel. Does it mean linux/ppc64 should have JS_HAS_FIXED_PAGE_SIZE=0 ?
Since one could configure transparent huge pages for x86_64, does it mean linux/x86_64 should also have JS_HAS_FIXED_PAGE_SIZE=0 ?
Attached patch v2 (obsolete) — Splinter Review
Luke and I discussed this offline today: it turns out that AsmJS and the GC really do want the same basic mechanism. We're going to go forward with the current strategy, adding another test to CompileAsmJS to exclude systems with a non-standard page size from AsmJS compilation.
Attachment #756805 - Attachment is obsolete: true
Attachment #756805 - Flags: review?(luke)
Attachment #764459 - Flags: review?(luke)
You sure that is the right patch?
Comment on attachment 764459 [details] [diff] [review]
v2

D'oh! Seems not.
Attachment #764459 - Attachment is obsolete: true
Attachment #764459 - Flags: review?(luke)
Attached patch v2: againSplinter Review
Sorry for the confusion!
Attachment #764488 - Flags: review?(luke)
Comment on attachment 764488 [details] [diff] [review]
v2: again

Review of attachment 764488 [details] [diff] [review]:
-----------------------------------------------------------------

Nice job.

::: js/src/ion/AsmJS.cpp
@@ +6174,5 @@
>      if (!JSC::MacroAssembler().supportsFloatingPoint())
>          return Warn(cx, JSMSG_USE_ASM_TYPE_FAIL, "Disabled by lack of floating point support");
>  
> +    if (cx->runtime()->gcSystemPageSize != AsmJSPageSize)
> +        return Warn(cx, JSMSG_USE_ASM_TYPE_FAIL, "Disabled by non-standard page size");

Probably also need to the granularity too?  (I know we assume that)

::: js/src/ion/AsmJS.h
@@ +89,5 @@
>      void setResumePC(void *pc) { resumePC_ = pc; }
>  };
>  
> +// Asm.js currently assumes a page size of 4096.
> +const size_t AsmJSPageSize = 4096;

Comment rather restated the code.  How about: "// The assumed page size; dynamically checked in CompileAsmJS."
Attachment #764488 - Flags: review?(luke) → review+
(In reply to Luke Wagner [:luke] from comment #27)
> > +    if (cx->runtime()->gcSystemPageSize != AsmJSPageSize)
> > +        return Warn(cx, JSMSG_USE_ASM_TYPE_FAIL, "Disabled by non-standard page size");
> 
> Probably also need to the granularity too?  (I know we assume that)

Luke and I discussed this on IRC, re-posting here for posterity: the reason we have allocation granularity in the GC is that this is usually 64K on windows, rather than the page size. We use it to assert that we are never wasting space: a serious concern for the GC, but less important for AsmJS.
Nope, I forgot to update ggc to use the new memory api.

https://hg.mozilla.org/integration/mozilla-inbound/rev/49e4ff129351
https://hg.mozilla.org/mozilla-central/rev/49e4ff129351
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla24
You need to log in before you can comment on or make changes to this bug.