GC: only decommit pages when the page size matches the arena size at runtime

RESOLVED FIXED in mozilla24



6 years ago
5 years ago


(Reporter: terrence, Assigned: terrence)



Firefox Tracking Flags

(Not tracked)



(2 attachments, 3 obsolete attachments)



6 years ago
On at least PPC, the dynamic nature of the page size ensures that we will never be able to perfectly pin the arena size to the page size. We should just cache the page size in js_InitGC and check if it is the same as the arena size before entering the decommit loop. This should solve our intermittent build problems on various platforms and greatly simplify the arena size definition for effectively zero runtime cost.

Comment 1

6 years ago
Created attachment 712614 [details] [diff] [review]
Attachment #712614 - Flags: review?(wmccloskey)
Attachment #712614 - Flags: feedback?(landry)
Atm m-c doesnt build for me on ppc32/openbsd, but that looks unrelated to that diff :

 5:03.08 /home/landry/src/mozilla-central/js/src/jsatom.cpp: In function 'bool js::XDRAtom(js::XDRState<mode>*, js::MutableHandleAtom)':
 5:03.08 /home/landry/src/mozilla-central/js/src/jsatom.cpp:519:42: error: no matching function for call to 'AtomizeChars(JSContext*&, jschar*&, uint32_t&)'
 5:03.08 /home/landry/src/mozilla-central/js/src/jsatom.cpp:519:42: note: candidate is:
 5:03.09 /home/landry/src/mozilla-central/js/src/jsatom.cpp:406:1: note: template<js::AllowGC allowGC> JSAtom* js::AtomizeChars(JSContext*, const jschar*, size_t, js::InternBehavior)

does it ring a bell to anyone ? that extra js::ib arg doesnt seem present in any of the calls to atomizechars.
Comment on attachment 712614 [details] [diff] [review]

Fwiw with your patch (and the one from #840611) the js shell builds fine on openbsd/ppc32, and only one test fails :

TEST-UNEXPECTED-FAIL | testDebugger_throwHook | CHECK failed: called

i'd have to double check but i'm pretty sure this is unrelated to the diff.
Attachment #712614 - Flags: feedback?(landry) → feedback+

Comment 4

6 years ago
Thanks for testing! The Debugger failure does seem unrelated. If you attach a backtrace I'll take a look.

Comment 5

6 years ago
Created attachment 713079 [details] [diff] [review]
v0 (backport to mozilla18)

I backported Terrence's patch to Mozilla18 and tested it on PPC64. It builds fine both 32 and 64 bits. However, after browsing a few web pages, the JS GC Helper thread gets stuck in an infinite loop in js::DecommitArenasFromAvailableList(). Here is the backtrace:

#0  js::DecommitArenasFromAvailableList (rt=rt@entry=0x3fffa1430000, availableListHeadp=availableListHeadp@entry=0x3fffa1430138)
    at /usr/src/debug/xulrunner-18.0.2/mozilla-release/js/src/jsgc.cpp:2816
#1  0x00003fffb23f49cc in DecommitArenas (rt=0x3fffa1430000) at /usr/src/debug/xulrunner-18.0.2/mozilla-release/js/src/jsgc.cpp:2844
#2  ExpireChunksAndArenas (shouldShrink=true, rt=0x3fffa1430000) at /usr/src/debug/xulrunner-18.0.2/mozilla-release/js/src/jsgc.cpp:2858
#3  js::GCHelperThread::doSweep (this=this@entry=0x3fffa1430bc0) at /usr/src/debug/xulrunner-18.0.2/mozilla-release/js/src/jsgc.cpp:3168
#4  0x00003fffb23f4bdc in js::GCHelperThread::threadLoop (this=0x3fffa1430bc0) at /usr/src/debug/xulrunner-18.0.2/mozilla-release/js/src/jsgc.cpp:3001
#5  0x00003fffb22ad3f8 in ?? () from /lib64/libnspr4.so
#6  0x00003fffb2e9c604 in start_thread (arg=0x3fffa13ff1c0) at pthread_create.c:310
#7  0x00003fffb2aaedb4 in .__clone () from /lib64/libc.so.6
Can't say if it's the same crash - here (openbsd/ppc32) after building m-c with the patch i've been able to load maps.google.fr, the nightly startup page and this bug page - firefox crashed with an OOM. I'm not able to load the core in gdb since libxul itself is >960mo and i only have 1gb physmem on this macmini, but i dont think the OOM is related to the patch itself.
Will test next patch :)

Comment 7

6 years ago
Comment on attachment 712614 [details] [diff] [review]

Review of attachment 712614 [details] [diff] [review]:

Applying my suggestions Firefox works fine on PPC64, otherwise it freezes after a while (JS GC Helper thread using 100% CPU stuck in DecommitArenasFromAvailableList()).

::: js/src/gc/Memory.cpp
@@ -97,1 @@
>          return false;

Shouldn't it return true (success) when decommit is disabled?

@@ -371,1 @@
>          return false;

Same here, return true when decommit is disabled.

Comment 8

6 years ago
(In reply to Gustavo Luiz Duarte from comment #7)
> Comment on attachment 712614 [details] [diff] [review]
> v0
> Review of attachment 712614 [details] [diff] [review]:
> -----------------------------------------------------------------
> Applying my suggestions Firefox works fine on PPC64, otherwise it freezes
> after a while (JS GC Helper thread using 100% CPU stuck in
> DecommitArenasFromAvailableList()).
> ::: js/src/gc/Memory.cpp
> @@ -97,1 @@
> >          return false;
> Shouldn't it return true (success) when decommit is disabled?

Well, no, not really: decommit is allowed to fail, in which case we do what we did before we implemented decommit. If we "succeed" the decommit here then the memory manager will treat the pages as decommitted when they are not. I think in practice this will just lead to the wrong numbers in about:memory -- not the end of the world. More importantly, this probably reflects a real problem with our memory management that we need to track down.
Comment on attachment 712614 [details] [diff] [review]

Review of attachment 712614 [details] [diff] [review]:

::: js/src/gc/Memory.h
@@ +15,5 @@
>  namespace gc {
>  // Sanity check that our compiled configuration matches the currently running
>  // instance and initialize any runtime data needed for allocation.
> +void InitMemorySubsystem(JSRuntime *rt);

The return type should go on a separate line in this function and the ones below. Can you fix?

::: js/src/jsgc.cpp
@@ +471,5 @@
>      }
>  }
>  static inline Chunk *
> +AllocChunk(JSRuntime *rt) {

Can you move the brace to its own line?

@@ +477,4 @@
>  }
>  static inline void
> +FreeChunk(JSRuntime *rt, Chunk *p) {

Same here.

@@ +585,5 @@
>  /* static */ Chunk *
>  Chunk::allocate(JSRuntime *rt)
>  {
> +    Chunk *chunk = static_cast<Chunk *>(AllocChunk(rt));

The cast seems unnecessary. Please remove.

@@ +598,5 @@
>      // This leaks all of these poisoned pointers. It would be better if they
>      // were marked as uncommitted, but it's a little complicated to avoid
>      // clobbering pre-existing unrelated mappings.
>      while (IsPoisonedPtr(chunk))
> +        chunk = static_cast<Chunk *>(AllocChunk(rt));

Same here.

@@ +731,5 @@
>      return -1;
>  }
>  ArenaHeader *
> +Chunk::fetchNextDecommittedArena(JSRuntime *rt)

Now that chunks store the runtime, you don't need to add this extra parameter.

@@ +2117,5 @@
>              if (ok) {
>                  ++chunk->info.numArenasFree;
>                  chunk->decommittedArenas.set(arenaIndex);
>              } else {
>                  chunk->addArenaToFreeList(rt, aheader);

This code is inside a loop that does:
  while (chunk->info.numArenasFreeCommitted != 0) {
That's why we never terminate if the decommit fails.

I think we should change the statement:
  if (rt->gcChunkAllocationSinceLastGC) {
to this:
  if (rt->gcChunkAllocationSinceLastGC || !ok) {
That should fix the infinite looping.

In addition, please add a check to the top of this function so that we return immediately if decommitting is disabled.
Attachment #712614 - Flags: review?(wmccloskey)

Comment 10

5 years ago
I like this!
Note that everything (at least _LP64) with ArenaShift != 12 will imediately abort at runtime with the current code (offset of bitmap in struct Arena does not match the precalculated magic).


5 years ago
Duplicate of this bug: 825165

Comment 12

5 years ago
Created attachment 756805 [details] [diff] [review]
v1: Applied Bill's review comments.

Luke, I'm flagging you for review on this because AsmJS recently added a dependency on the fixed PageSize that this patch removes. I've moved the current definition to AsmJS.h as AsmJSPageSize and fixed it to 4096. I assume we don't compile asm.js on any of the platforms affected by this bug anyway, so I think this will probably be fine.
Attachment #712614 - Attachment is obsolete: true
Attachment #756805 - Flags: review?(luke)
Using att #756805 (and removing word from BE64 in Value.h, see last comment in bug 618485) allows me to build the js engine on OpenBSD/sparc64. As stated by martin comment 10, jsapi-tests immediatly blows at runtime with a bus error, but it's an improvement over a non-building tree.

Comment 14

5 years ago
Landry: you missunderstood my comment. With this change, it does not blow up any more, as ArenaShift is fixed to 12 and makes the precalculated offsets match the real layout.
The bus errors you see are probably due to some of the other tickets I recently filed which you do not have in your tree yet.
Right. With that word struct member removed from Value.h on line 324, att #756805 from that bug & patches from bugs # 870325 and 871444, firefox from trunk _runs_ on OpenBSD/sparc64.


User Agent: Mozilla/5.0 (X11; OpenBSD sparc64; rv:24.0) Gecko/20130602 Firefox/24.0

It even loads google maps without crashing, and besides the favicons & the globe on about:home having this weird blueish colors, it seems to be stable. Just awesome.

jsapi-tests also succeeds :
Passed: ran 113 tests.

Comment 17

5 years ago
Could we instead have JS_HAS_FIXED_PAGED_SIZE, put PageSize inside #ifdef JS_HAS_FIXED_PAGE_SIZE, use this to disable asm.js compilation (easy, at the head of js::CompileAsmJS) as well as GC decommitting?

Comment 18

5 years ago
Note that there are architectures where we have a fixed page size but it is != 4k. I would love to get asm.js going there as well, but it is not a high priority target for me right now - enough other issues to resolve first.

Comment 19

5 years ago
Bigger or smaller than 4k?

Comment 20

5 years ago
sparc64 uses 8k (the hardware offers various larger sizes as well, but they are typically used for mapping framebuffers or the like, but not arbitrary memory).

All that is needed, most likely, is to fix some offset calculations, but I couldn't see how they all play together, and the patch from Terence was the quickest way out for now.
Same for us on OpenBSD, sparc64 uses 8k, hppa, amd64, ppc(32) & i386 uses 4k..

Comment 22

5 years ago
It is not clear to me what "fixed page size" means? On Fedora we have the page size fixed on 64k for ppc64. Event though we don't ship 4k page size kernels for ppc64, the user can always build a custom 4k page size kernel. Does it mean linux/ppc64 should have JS_HAS_FIXED_PAGE_SIZE=0 ?
Since one could configure transparent huge pages for x86_64, does it mean linux/x86_64 should also have JS_HAS_FIXED_PAGE_SIZE=0 ?

Comment 23

5 years ago
Created attachment 764459 [details] [diff] [review]

Luke and I discussed this offline today: it turns out that AsmJS and the GC really do want the same basic mechanism. We're going to go forward with the current strategy, adding another test to CompileAsmJS to exclude systems with a non-standard page size from AsmJS compilation.
Attachment #756805 - Attachment is obsolete: true
Attachment #756805 - Flags: review?(luke)
Attachment #764459 - Flags: review?(luke)

Comment 24

5 years ago
You sure that is the right patch?

Comment 25

5 years ago
Comment on attachment 764459 [details] [diff] [review]

D'oh! Seems not.
Attachment #764459 - Attachment is obsolete: true
Attachment #764459 - Flags: review?(luke)

Comment 26

5 years ago
Created attachment 764488 [details] [diff] [review]
v2: again

Sorry for the confusion!
Attachment #764488 - Flags: review?(luke)

Comment 27

5 years ago
Comment on attachment 764488 [details] [diff] [review]
v2: again

Review of attachment 764488 [details] [diff] [review]:

Nice job.

::: js/src/ion/AsmJS.cpp
@@ +6174,5 @@
>      if (!JSC::MacroAssembler().supportsFloatingPoint())
>          return Warn(cx, JSMSG_USE_ASM_TYPE_FAIL, "Disabled by lack of floating point support");
> +    if (cx->runtime()->gcSystemPageSize != AsmJSPageSize)
> +        return Warn(cx, JSMSG_USE_ASM_TYPE_FAIL, "Disabled by non-standard page size");

Probably also need to the granularity too?  (I know we assume that)

::: js/src/ion/AsmJS.h
@@ +89,5 @@
>      void setResumePC(void *pc) { resumePC_ = pc; }
>  };
> +// Asm.js currently assumes a page size of 4096.
> +const size_t AsmJSPageSize = 4096;

Comment rather restated the code.  How about: "// The assumed page size; dynamically checked in CompileAsmJS."
Attachment #764488 - Flags: review?(luke) → review+

Comment 28

5 years ago
(In reply to Luke Wagner [:luke] from comment #27)
> > +    if (cx->runtime()->gcSystemPageSize != AsmJSPageSize)
> > +        return Warn(cx, JSMSG_USE_ASM_TYPE_FAIL, "Disabled by non-standard page size");
> Probably also need to the granularity too?  (I know we assume that)

Luke and I discussed this on IRC, re-posting here for posterity: the reason we have allocation granularity in the GC is that this is usually 64K on windows, rather than the page size. We use it to assert that we are never wasting space: a serious concern for the GC, but less important for AsmJS.
Backed out in http://hg.mozilla.org/integration/mozilla-inbound/rev/37392ce15efa on the assumption that burning ggc like https://tbpl.mozilla.org/php/getParsedLog.php?id=24447753&tree=Mozilla-Inbound was not what you wanted to do.

Comment 31

5 years ago
Nope, I forgot to update ggc to use the new memory api.

Last Resolved: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla24
You need to log in before you can comment on or make changes to this bug.