Closed Bug 1257234 Opened 4 years ago Closed 4 years ago

Detect Windows stack size at runtime

Categories

(Core :: JavaScript Engine, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla48
Tracking Status
firefox45 --- wontfix
firefox46 --- fixed
firefox47 --- fixed
firefox48 --- fixed
firefox-esr45 --- fixed

People

(Reporter: jandem, Assigned: jandem)

References

Details

Attachments

(1 file)

We've been bumping the stack quota every now and then (most recently bug 1256306), but especially Win64 PGO builds pretty easily hit the stack limit (bug 1247496 and others).

Stack overflow errors usually show up out of the blue (PGO, random code changes, etc), don't happen on all platforms/browsers, and can make websites completely unusable. It'd be really nice to eliminate this class of bugs.

It's possible to use a linker flag to request a larger stack, for Windows see https://msdn.microsoft.com/en-us/library/8cxs58a6.aspx We could do this on Linux and Windows - OS X already has an 8 MB stack.

As Ted mentioned in bug 1256306, we should be careful with background threads - we could audit all of them and make sure we pass an explicit stack size.
And we have another Win64 stack overflow, bug 1257308.

Ted, if I want to add this linker flag to ld/MSVC etc, where should I do that? Is this acceptable, if we audit the background threads?
Flags: needinfo?(ted)
Depends on: 1257308
FWIW I did some digging with VMMap, and MS Edge seems to use a 10 MB stack (dumpbin.exe would be a more direct way to get this info, but this VM doesn't have Visual Studio installed).

Starting with something like 4 MB seems reasonable though.
Does the main thread stack size actually represent a committed size, or is it just the potential for growth? FWIW, if we have any direct calls to CreateThread, _beginthread, or _beginthreadex that need to be updated, we should make sure they pass the STACK_SIZE_PARAM_IS_A_RESERVATION flag so we don't end up committing 1MB for every thread [1]. We might not need to update anything though, unless we find ourselves running out of address space as a result of the bigger reservations.

[1] The flag is somewhat badly described. Without it, the stack size parameter sets the original committed size, which is usually *not* what you want unless you know you're going to be using a lot of stack right away. By passing the flag, you're telling it to reserve that much *address space*, but it won't actually commit the memory except when needed. The documentation for _beginthreadex also doesn't mention the flag, but _beginthreadex just calls CreateThread under the hood.
(In reply to Jan de Mooij [:jandem] from comment #1)
> And we have another Win64 stack overflow, bug 1257308.
> 
> Ted, if I want to add this linker flag to ld/MSVC etc, where should I do
> that? Is this acceptable, if we audit the background threads?

You can just stick it in LDFLAGS in browser/app/moz.build like we do with /HEAP:
https://dxr.mozilla.org/mozilla-central/rev/341344bdec8f10bf50646cd6ef2355361435cbf6/browser/app/moz.build#63

I don't have any particularly vested interest in our memory usage, but someone like njn might.

(In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #3)
> Does the main thread stack size actually represent a committed size, or is
> it just the potential for growth?

Per https://msdn.microsoft.com/en-us/library/8cxs58a6.aspx, it's the reserved size, and you can also pass a second number as the commit size (which defaults to 4k).
Flags: needinfo?(ted)
What's the current stack size? What's the proposed stack size? How many stacks would be affected?

Emanuel's point in comment 3 is a good one, but on Win32 even address space is relatively precious, so we need to be careful.
(In reply to Nicholas Nethercote [:njn] from comment #5)
> What's the current stack size? What's the proposed stack size? How many
> stacks would be affected?

Windows uses a default stack size of 1 MB. For Win64 PGO builds, that's simply not enough and we keep breaking websites that need more than that (I've seen at least 4 bug reports in the past week alone).

I'm proposing we increase the main thread's stack size from 1 MB to 4 MB. I can audit all places where we create background threads and pass an explicit 1 MB size, so we don't reserve 4 MB there.

Does that sound reasonable?
(In reply to Jan de Mooij [:jandem] from comment #6)
> I'm proposing we increase the main thread's stack size from 1 MB to 4 MB.

Another option is to use 2 MB on 32 bit, and 4-8 MB on 64 bit, since the problem is worse on Win64 and Win32 address space is more precious.

We may need some code to determine the current stack size at runtime, as xpcshell/Thunderbird/etc will probably still use the default 1 MB, but that shouldn't be too hard.
If it's just the main thread, that's fine. Using a smaller size on 32-bit sounds reasonable. Thank you for the extra info.
Blocks: 1257308
No longer depends on: 1257308
(In reply to Jan de Mooij [:jandem] from comment #7)
> We may need some code to determine the current stack size at runtime, as
> xpcshell/Thunderbird/etc will probably still use the default 1 MB, but that
> shouldn't be too hard.

We should also bump the limit for xpcshell:
https://dxr.mozilla.org/mozilla-central/source/js/xpconnect/shell/moz.build

From some brief searching, it looks like with a few calls to VirtualQuery you can determine the stack extents:
http://stackoverflow.com/a/1747499/69326
Assignee: nobody → jdemooij
Status: NEW → ASSIGNED
Ah, we're already using a 2 MB stack on Win64!

https://dxr.mozilla.org/mozilla-central/source/config/config.mk#380

That's great because it means a patch to make the stack quota code detect this should be all we need short-term. We should probably uplift that even.

Then as a follow up we can consider using 4 MB and/or changing Win32 to use 2 MB as well, but that's less urgent.
Attached patch PatchSplinter Review
This adds some code to get the stack size at runtime. I added some logging and this indeed results in 2 MB on Win64, 1 MB on Win32.
Attachment #8731681 - Flags: review?(ted)
Summary: Use linker flags to get larger stacks on Windows and Linux → Detect Windows stack size at runtime
Oh, hah, I totally didn't see that!
Comment on attachment 8731681 [details] [diff] [review]
Patch

Review of attachment 8731681 [details] [diff] [review]:
-----------------------------------------------------------------

I don't think I'm a peer of this code but this looks sensible to me.
Attachment #8731681 - Flags: review?(ted) → review+
https://hg.mozilla.org/mozilla-central/rev/9e117944cd9f
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
Comment on attachment 8731681 [details] [diff] [review]
Patch

Approval Request Comment
[Feature/regressing bug #]: Windows 64-bit builds.

[User impact if declined]: Broken websites with Win64 builds, including popular websites like Vine, see bug 1227035, bug 1257308, bug 1247496. Also, more users will run into this as they switch to Win64 builds.

[Describe test coverage new/current, TreeHerder]: Tested on Treeherder, in a Nightly or two.

[Risks and why]: Low risk. Windows 64-bit builds already allocate a 2 MB stack, with this patch we can use all of that instead of limiting ourselves to just 1 MB.

[String/UUID change made/needed]: None.
Attachment #8731681 - Flags: approval-mozilla-esr45?
Attachment #8731681 - Flags: approval-mozilla-beta?
Attachment #8731681 - Flags: approval-mozilla-aurora?
Comment on attachment 8731681 [details] [diff] [review]
Patch

Improve the windows 64 support, taking it.
Should be in 46 beta 5
Attachment #8731681 - Flags: approval-mozilla-esr45?
Attachment #8731681 - Flags: approval-mozilla-esr45+
Attachment #8731681 - Flags: approval-mozilla-beta?
Attachment #8731681 - Flags: approval-mozilla-beta+
Attachment #8731681 - Flags: approval-mozilla-aurora?
Attachment #8731681 - Flags: approval-mozilla-aurora+
This failed to apply to aurora: 

[:~/mozilla/unified] $ hg graft -er 9e117944cd9f
grafting 333353:9e117944cd9f "Bug 1257234 - Detect main thread's stack size at runtime, on Windows. r=ted"
merging js/xpconnect/src/XPCJSRuntime.cpp
warning: conflicts while merging js/xpconnect/src/XPCJSRuntime.cpp! (edit, then use 'hg resolve --mark')
abort: unresolved conflicts, can't continue
Flags: needinfo?(jdemooij)
(In reply to Wes Kocher (:KWierso) from comment #20)
> I had to back this out from beta and esr45 because it made Windows 8 PGO
> m-oth tests fail really frequently:
> 
> https://hg.mozilla.org/releases/mozilla-beta/rev/db9f13bf8a43
> https://hg.mozilla.org/releases/mozilla-esr45/rev/5f7ff8d56225
> 
> https://treeherder.mozilla.org/logviewer.html#?job_id=946012&repo=mozilla-
> beta
> https://treeherder.mozilla.org/logviewer.html#?job_id=946014&repo=mozilla-
> beta

Aurora looks green so far, fwiw.
(In reply to Wes Kocher (:KWierso) from comment #20)
> I had to back this out from beta and esr45 because it made Windows 8 PGO
> m-oth tests fail really frequently:

Sorry about that. Looks like bug 1259699, we should try again with that change.
Is it possible that this change is also responsible for intermittents like bug 1259747 and bug 1258073?  My patch in bug 1177488 was backed out over m-oth oranges, e.g.:

https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=95494e68c722&filter-searchStr=Windows%20x64%20debug&selectedJob=24650805
(In reply to Nathan Froyd [:froydnj] from comment #23)
> Is it possible that this change is also responsible for intermittents like
> bug 1259747 and bug 1258073?  My patch in bug 1177488 was backed out over
> m-oth oranges, e.g.:

Can you still reproduce this with the patch in bug 1259699 (just landed on inbound)?
You need to log in before you can comment on or make changes to this bug.