Closed Bug 846670 Opened 7 years ago Closed 7 years ago

Assertion failure: stackMin <= stackEnd sometimes seen in xpcshell

Categories

(Core :: JavaScript Engine, defect)

ARM
Gonk (Firefox OS)
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla22

People

(Reporter: dhylands, Assigned: dhylands)

References

Details

Attachments

(1 file, 2 obsolete files)

Under bionic, the size of the main thread stack is hard-coded at 128K
See line 67 of bionic/libc/bionic/libc_init_common.c

I've been noticing the following:
Assertion failure: stackMin <= stackEnd, at /home/work/B2G-unagi/mozilla-inbound/js/src/gc/RootMarking.cpp:325

when running xpcshell in a debug build on my unagi. I traced this back to the stack overflowing the 128K stack from the main thread. So this is the stack size of the xpcshell executable.

Some observations: There is no guard page for the main thread stack. All other threads seem to get a 1M  stack (see line 83 of bionic/libc/bionic/pthread.c

I confirmed that the b2g main thread has a 128K stack.

Presumably this applies to the android firefox browser as well.
This sounds really bad. With the right JS we could easily pass this limit, right?
blocking-b2g: --- → tef?
It looks like the main thread stack can grow dynamically up to about 8Mb

Around the time the stack got that big, maps reported:

be19e000-be999000 rw-p 00000000 00:00 0          [stack]

which is a size of 0x7fb000 (8172K)

Reporting from within the program itself reported:

stackBase: 0xbe979000 stackSize: 0x00020000 stackEnd: 0xbe999000 stackPtr: 0xbe19f308

stackBase and stackSize come from the pthread_attr_getstack call, so these don't get updated as the stack grows.

So the problem I'm observing isn't caused by the stack being limited, but by something else. (because the assert I was observing happens when the top of stackMin gets > stackEnd which is an issue on the high side of the stack, not the low side)
I've updated the description to reflect more accurately what I observed
Summary: B2G MainThread stacks are too small → Assertion failure: stackMin <= stackEnd sometimes seen in xpcshell
The observation of the stack being around 8Mb was from a test program I wrote to play with stacks, and not B2G (just to be clear).
The 8Mb limit is determined by ulimit -s (8192 Kb). You can use

ulimit -s 16384

to increase the max stack (of the main thread) to 16 Mb, and that also seems to working fine.
And since I had the test program, stacks for non-main threads are not dynamic. Their size is determined by the stack size specified in the pthread attrs at thread creation time.

The default size is 1Mb.
So the root cause seems to be that pthread_attr_getstack isn't telling the truth for the main thread.

I added some code in bionic's __libc_init_common function to stash away stacktop, stackbottom, stacksize, and the results of __get_sp and print them as the first line of code in xpcshell's main function:

_stackbottom = 0xbea1b000 _stacksize = 0x00020000
         _sp = 0xbea3af50 
   _stacktop = 0xbea3b000

So far everything looks nice and consistent. Calling pthread_attr_getstack returns:

pthread_attr_getstack stackBase = 0xbea1b000
pthread_attr_getstack stackSize = 0x00020000

The kicker is that when main is called, __get_sp returns 0xbea3b600 which is outside the stacktop/stackbottom.

I also wrote some code to grab the [stack] section from /proc/self/maps at xpcshell main time. It reports:

bea1b000-bea3c000 rw-p 00000000 00:00 0          [stack]

So stacktop should really have been 0xbea3c000 instead of 0xbea3b000.

Later, when the ASSERT happens, nativeStackTop = 0xbea3b4f0 which is within the stack, but is outside the stackbottom/stacktop range.

From this we can conclude that __libc_init_common must have a bunch of stuff on the stack that isn't on the stack when main is called.
Assignee: nobody → general
Component: General → JavaScript Engine
Product: Boot2Gecko → Core
Assignee: general → dhylands
This patch causes the stack bounds to be determined properly for the main-thread, by scanning /proc/self/maps.

https://tbpl.mozilla.org/?tree=Try&rev=3f5936accc9e
Attachment #720168 - Flags: review?(wmccloskey)
Oops. Now builds debug and non-debug. New try submission:

https://tbpl.mozilla.org/?tree=Try&rev=09f05359f8ba
Attachment #720168 - Attachment is obsolete: true
Attachment #720168 - Flags: review?(wmccloskey)
Attachment #720198 - Flags: review?(wmccloskey)
New try run with #include's added

https://tbpl.mozilla.org/?tree=Try&rev=163d253781a7

I'll wait until its green before updating the attached patch.
I'll also have to see why B2G didn't generate the same errors as android when building.
(Clearing tef?.  "when running xpcshell in a debug build on my unagi" doesn't sound like this issue can affect a release v1.0.1 build.  But if it can then please re-nom)
blocking-b2g: tef? → ---
The conservative stack scanner in our JS engine depends on the correctness of the values here.
We also only care about sections of the stack that are gecko related and can contain pointers into our JS heap. 
So if it turns out that this section can't contain any valid heap pointers, we shouldn't block on it.
Comment on attachment 720198 [details] [diff] [review]
Fix compile problems with non-debug builds

Gregor found a patch which does this in a slightly cleaner manner, so I'm going to rework  bit.

http://repository.timesys.com/buildsources/q/qt-embedded-linux/qt-embedded-linux-4.8.1/qt-embedded-linux-4.8.1-javascript-uclibc-0932.patch

Thee key difference is that  it looks for a section based on address, where mine looks for [stack]. I like find the section base on address better.
Attachment #720198 - Flags: review?(wmccloskey)
Attachment #720198 - Attachment is obsolete: true
Comment on attachment 722370 [details] [diff] [review]
Fix compile problems with non-debug builds. v3

I did a couple of try runs:
https://tbpl.mozilla.org/?tree=Try&rev=70d85faa4046
https://tbpl.mozilla.org/?tree=Try&rev=bbc08cf88247
Attachment #722370 - Flags: review?(wmccloskey)
Comment on attachment 722370 [details] [diff] [review]
Fix compile problems with non-debug builds. v3

Review of attachment 722370 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks, looks nice. It's awesome you tracked this down.

::: js/src/jsnativestack.cpp
@@ +153,5 @@
> +            unsigned int stackAddr = (unsigned int)(unsigned long)&sattr;
> +            while (fgets(line, sizeof(line), fs) != NULL) {
> +                unsigned int stackStart;
> +                unsigned int stackEnd;
> +                if (sscanf(line, "%x-%x ", &stackStart, &stackEnd) != 2) {

In the JS engine, we don't put braces around single-line statements. So please remove these ones.

I guess we don't need to use %lx since ARM is always 32-bit.

@@ +156,5 @@
> +                unsigned int stackEnd;
> +                if (sscanf(line, "%x-%x ", &stackStart, &stackEnd) != 2) {
> +                    continue;
> +                }
> +                if ((stackAddr >= stackStart) && (stackAddr < stackEnd)) {

These extra parens should also be dropped.

@@ +174,2 @@
>  # else
> +   DBG(rc =) pthread_attr_getstack(&sattr, &stackBase, &stackSize);

If this code fails, we're really in trouble. Let's assert !rc even in opt builds. Can you convert JS_ASSERT(!rc) to this:
    if (!rc)
        MOZ_CRASH();

That will avoid all this ugly debug-only stuff, too.
Attachment #722370 - Flags: review?(wmccloskey) → review+
Comment on attachment 722370 [details] [diff] [review]
Fix compile problems with non-debug builds. v3

Review of attachment 722370 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsnativestack.cpp
@@ +153,5 @@
> +            unsigned int stackAddr = (unsigned int)(unsigned long)&sattr;
> +            while (fgets(line, sizeof(line), fs) != NULL) {
> +                unsigned int stackStart;
> +                unsigned int stackEnd;
> +                if (sscanf(line, "%x-%x ", &stackStart, &stackEnd) != 2) {

Surely we have something better than using sscanf to parse input values, seeing as a too-big input for the output type triggers undefined behavior.  :-(

Okay, probably not.  :-(  I'd rather see %lx used everywhere, to be safe.
I could use stroul instead of sscanf. strtoul will parse hex, and will stop on the - and return the pointer to the character that stopped the scanning. Since this is at the beginning of the line, I think it would boil down to something like:

  s = line;
  stackStart = strtoul(s, &endPtr, 16);
  if (stackStart > 0 && *endPtr == '-') {
    s = endPtr + 1;
    stackEnd = strtoul(s, &endPtr, 16);
    if (stackEnd > 0 && *endPtr == ' ' && stackAddr >= stackStart && stackAddr < stackEnd) {
       ...

Even though strtol can parse 0, 0 isn't a valid address for the start of a stack segment on linux.
(In reply to Dave Hylands [:dhylands] from comment #18)
> I could use stroul instead of sscanf. strtoul will parse hex, and will stop
> on the - and return the pointer to the character that stopped the scanning.
> Since this is at the beginning of the line, I think it would boil down to
> something like:
> 
>   s = line;
>   stackStart = strtoul(s, &endPtr, 16);
>   if (stackStart > 0 && *endPtr == '-') {
>     s = endPtr + 1;
>     stackEnd = strtoul(s, &endPtr, 16);
>     if (stackEnd > 0 && *endPtr == ' ' && stackAddr >= stackStart &&
> stackAddr < stackEnd) {
>        ...
> 
> Even though strtol can parse 0, 0 isn't a valid address for the start of a
> stack segment on linux.

That seems strictly worse to me. As long as you use %lx, I don't think there's anything wrong with the scanf.
https://hg.mozilla.org/mozilla-central/rev/8b3c093d11eb
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla22
Blocks: 1307655
You need to log in before you can comment on or make changes to this bug.