Closed Bug 359453 Opened 19 years ago Closed 1 year ago

Excessive local variable cause MSVC to insert stack probe code

Categories

(Core Graveyard :: GFX, defect)

x86
Windows XP
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: swsnyder, Unassigned)

Details

(Keywords: perf)

Attachments

(4 files)

The function openPrefFile() uses a 4KB file buffer which is allocated as a local variable. This, in addition to 72 bytes of other local variables, causes MSVC v7.1 to insert stack probes into the code. This describes the compiler's behavior: http://windowssdk.msdn.microsoft.com/en-us/library/ms648426.aspx What I see with FF 2.0 at run time is a call to _chkstk() to verify that the 0x1048 bytes of local variable use won't cause a stack underrun. The silent insertion of the stack probes makes Firefox both bigger and slower than need be. The quick fix is to drop the size of that 4096 byte buffer down to 4000 bytes, so that the total stack use by local variables is below 4096 bytes.
Component: Preferences → Preferences: Backend
Product: Firefox → Core
QA Contact: preferences → prefs
Version: 2.0 Branch → 1.8 Branch
This patch reduces the size of the file buffer to 4000 bytes. With this modification the compiler generates code for 0x0fe8 bytes of stack space used as local variables. Since this is below the 0x1000 byte threshhold, no stack probe is inserted into the code. It might be smarter just to move the file buffer to static allocation, but as I'm not very familiar with the Preferences code, I'm taking the path of least invasiveness.
Well, I'm not sure what to do with this. After finding this 4096+ use of stack, I got curious and did some snooping. There are far worse offenders than openPrefFile() but I'm told that there isn't any particular policy on local stack use, so maybe this isn't worth pursuing. FWIW, setting a break point on the more-than-4096 case in _chkstk yields ~950 hits in the course of bringing FF 2.0 up to www.yahoo.com. It seems that FindSafeLength() (8000-byte local buffer) is the worst offender in terms of frequency, though not in size.
I don't think we care about this in code that isn't perf-sensitive and isn't likely to recurse and chew up lots of stack. roc, is there a way not to use gobs of stack space for this function? Steve says this showed up in a profile. http://lxr.mozilla.org/mozilla/source/gfx/src/shared/nsRenderingContextImpl.cpp#513
Assignee: swsnyder → general
Component: Preferences: Backend → GFX
QA Contact: prefs → ian
Yeah, this would be easy to fix. I'll do it.
Attached patch fix?Splinter Review
This makes it so the buffer isn't used on Windows; is the compiler smart enough to not do the run-time check?
Assignee: general → roc
Status: NEW → ASSIGNED
(In reply to comment #5) > Created an attachment (id=244779) [edit] > fix? Nope. Both before and after patch: 00A0F6E0 push ebp 00A0F6E1 mov ebp,esp 00A0F6E3 mov eax,1F44h 00A0F6E8 call _chkstk (0A6ACC0h) 00A0F6ED mov eax,dword ptr [ebp+0Ch] if (aLength <= aMaxChunkLength) 00A0F6F0 cmp eax,ebx return aLength; 00A0F6F2 jbe FindSafeLength+9Dh (0A0F77Dh)
Attached patch how about this?Splinter Review
try this... I assume you're doing an optimized build
Attached patch how about this?Splinter Review
try this... I assume you're doing an optimized build
(In reply to comment #8) > Created an attachment (id=244852) [edit] > how about this? > > try this... I assume you're doing an optimized build Yes, I am. With this patch applied it seems that FindSafeLength is entirely optimized away. That is, there's no actual code to go with those lines of source. And this is the start of FindSafeLengthFromClusters(): 00A0F500 push ebp 00A0F501 mov ebp,esp 00A0F503 mov eax,1F44h 00A0F508 call _chkstk (0A6AD70h) 00A0F50D mov eax,dword ptr [ebp+8] 00A0F510 push esi 00A0F511 mov esi,ecx
FindSafeLength probably got inlined. You'll need to find it in amongst the caller code :-)
Just an update for those interested. My original comments were regarding MSVC 7.1 (VS2003) and Firefox 2.0.x. The figures below reflect the use of MSVC 8.0 (VS2005) and the Mozilla trunk code, pulled 15 Nov 2007 and built as SeaMonkey with optimization enabled. Those _chkstk probes are still used in the updated compiler. Breakpoints set on the more-than-4096-bytes case show 1278 hits when bringing SeaMonkey v2.0a1pre up to about:blank page. Bringing the same build up to http://www.yahoo.com yields 5269 hits to the breakpoint. FYI.
Flags: blocking1.9?
Keywords: perf
Version: 1.8 Branch → Trunk
Roc/Bz - any sense of how important this is? Are there other areas in the code that would respond to this?
5269 hits probably isn't a big deal, these probes should be really cheap. I don't think we should block, certainly.
Not going to block for this - but Steve we'd gladly take a patch if you get this fixed up... Bz if you disagree with this please comment/re nom.
Flags: blocking1.9? → blocking1.9-
We should take a look at this periodically to see if we should limit some of our stack usage in places, but I agree, these should be fairly cheap (cheaper than mallocing most likely) and probably not worth blocking on.
Product: Core → Core Graveyard
This is a mass change. Every comment has "assigned-to-new" in it. I didn't look through the bugs, so I'm sorry if I change a bug which shouldn't be changed. But I guess these bugs are just bugs that were once assigned and people forgot to change the Status back when unassigning.
Status: ASSIGNED → NEW
Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: