Closed Bug 142869 Opened 23 years ago Closed 23 years ago

gcc -O2 bug in nsBinaryOutputStream::Write64 breaks fastload

Categories

(Core :: XPCOM, defect)

defect
Not set
critical

Tracking

()

VERIFIED FIXED

People

(Reporter: cathleennscp, Assigned: bugs)

References

Details

(Keywords: perf, qawanted, regression)

Attachments

(3 files, 4 obsolete files)

Trunk checkin on 5/2 between 17:49 to 20:58 hr PST caused regression on txul (window open) Also note, this is a trade off between startup(ts) improvement and slowing down window open (txul). http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=all&branch=HEAD&branchtype=match&dir=&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&date=explicit&mindate=05%2F02%2F2002+17%3A49%3A00&maxdate=05%2F02%2F2002+20%3A58%3A00&cvsroot=%2Fcvsroot Here are some perf data from a few tinderboxes: tinderbox commet ------------------- txul: 460 -> 474 ( +14ms, or +3.04%) ts : 1587 -> 1529 ( -58ms, or -3.65%) tinderbox luna ------------------- txul: 883 -> 901 ( +18ms, or +2.04%) ts : 3379 -> 3288 ( -91ms, or -2.69%) tinderbox sleekstack ------------------- txul: 1695 -> 1733 ( +38ms, or +2.24%) ts : 5425 -> 5283 ( -142ms, or -2.62%) ben's fastload XUL is most suspecious.
Keywords: perf, regression
Ben! Busted! :-) In principle, fastload should have no effect on window open times. But perhaps something may be broken here. varga also noted that he was seeing his fasl file invalidated during 'new window', although I couldn't reproduce that (and if we were trashing the fastload file during a 'new window' I would expect that the window open times would have increased by much more than ~3%).
Severity: normal → critical
Here's what I think this is (will investigate shortly) - GetPrototype on nsXULProtoCache is being called for all types of XUL docs, .properties, .js, etc, and some new code has been introduced to that function to handle .xul documents. An extra guard needs to be included, likely.
Status: NEW → ASSIGNED
jrgm, yeah, I'm seeing my fasl file invalidated during 'new window' on linux I checked windows and there it seems ok.
I meant 'new window of some app'. e.g bookmarks, download manager, JS console etc.
Jan, I don't see the XUL.mfasl being invalidated in the mozilla.org trunk build on Linux from this morning, when opening new xul windows (e.g., history, prefs, various pref panels, bookmarks). Don't know why I don't and you do.
I'm seeing the described problem only with gcc 3.0.4 and -O2 I recompiled with |gcc version 2.96 20000731 (Red Hat Linux 7.1 2.96-98)| and now it works ok. Sorry for false alarm.
varga: it's possible you've found a gcc 3.0.4 -O2 bug, but you might also have found an uninitialized variable or memory ambiguity (pointer aliasing) bug in or near the new FastLoad code. Cc'ing dbaron. /be
The same behaviour with gcc 3.1 -O2
What I'm seeing in my gcc 3.0.4 -O2 build is that when I open the bookmarks window, I get a much smaller XUL.mfasl file written out. I'm not getting an Invalid.mfasl, though. I assume you just meant the former.
yeah, exactly
OK, so what I'm seeing is that every time we try to create a new toplevel window, we fail at the modified time check (the one that has the |#ifdef DEBUG| printf) in nsFastLoadFileReader::ReadFooter (called from: nsFastLoadFileReader::Open NS_NewFastLoadFileReader nsFastLoadService::NewInputStream nsXULPrototypeCache::StartFastLoad I'm seeing (based on an additional printf): fastLoadMtime=4619835791474493488 currentMtime=1021790718000
Or, slightly more informative (yes, "%llX" works, however obscure it may be): fastLoadMtime=0x401CF42CE77625E8 currentMtime=0xEDE77625E8 The lower word is the same, but the upper word is totally different. > date "+%s" 1021939318 which suggests a valid time (in milliseconds instead of seconds) should be 1,021,xxx,xxx,xxx suggests that it's fastLoadMtime that's wrong. Investigating.
I believe the problem is in nsBinaryOutputStream::Write64. If I put printfs into that function (before and after the NS_SWAP64), things begin working. (I had to take all my printfs out and put them back one at a time to figure that out, since once it starts working I need to remove the fastload file.)
This testcase is a somewhat-simplified testcase (I just wrote it to see if my understanding of the problem was correct, and I didn't make any further attempts to simplify it). This bug can be seen with either the C or C++ compiler (I had a C++ version using reinterpret_cast instead of old-style casts, but I changed it so I could use the C compiler as well). [my usual tcsh prompt] > set prompt="%? %# " 0 > /usr/local/gcc-3.0.4/bin/gcc test.c 0 > ./a.out 0 > /usr/local/gcc-3.0.4/bin/gcc -O test.c 0 > ./a.out 0 > /usr/local/gcc-3.0.4/bin/gcc -O2 test.c 0 > ./a.out 1 > /usr/local/gcc-3.1/bin/gcc -O2 test.c 0 > ./a.out 1 > gcc -O2 test.c 0 > ./a.out 1 > gcc -v Reading specs from /usr/lib/gcc-lib/i386-redhat-linux/2.96/specs gcc version 2.96 20000731 (Red Hat Linux 7.1 2.96-98) 0 > gcc -O test.c 0 > ./a.out 0 >
To summarize the above confusing log, the bug occurs at -O2 but not at -O, and is present in gcc 2.96-98 (RedHat), gcc 3.0.4, and gcc 3.1.
Summary: window open slowed down by 2-3% → gcc -O2 bug in nsBinaryOutputStream::Write64 breaks fastload
Actually, though, that doesn't explain the new window time increase, since sleestack, luna, and comet all build with -O, not -O2. (And all use gcc 2.96.) Unless, that is, they use a version of 2.96 that also shows the problem with -O. Could someone test that on one of those boxes?
Hmm. So, -fno-strict-aliasing fixes this. It would surprise me if this is a violation of the aliasing rules, though.
(*(unsigned int*)&(x)) You're taking a long long, and turning it into an unsigned int. I don't think doing so via pointers doesn't make a difference. You need to use the gcc extention of having: union { int i[2]; long long ll; } like js does. I think.
Actually, the cast isn't necessary to reproduce the bug. I think it has something to do with gcc treating the upper and lower parts of the long long as though they were separate variables.
Oh, but if I remove the casts in NS_SWAP64 as well, then the bug goes away. So it's NS_SWAP64 where the problem is...
This fixes the -O2 problem for gcc. I'm not sure how portable it is, though.
> //XXX What if we have a compiler that follows > // aliasing rules strictly but doesn't have a 64-bit int type? You need to do the same with LL_SHL and friends. Actually, why can't you do that anyway? This code shouldn't care about HAVE_LONG_LONG directly, should it? Actually, isn't the x+1 in there for the !HAVE_LONG_LONG currently wrong? PRInt64 is a struct of {lo, hi}, and theres no guarnetee that there isn't any padding in the middle, is there? Not that I know of a compiler which will pad between 32 bit types, mind you.
Attached patch style nit-picking (obsolete) — Splinter Review
I'll look into a prlong.h patch, but I'll have to run it by wtc. We should get this patch in now, to fix the bug ASAP. bbaetz: no compiler known will pad between hi and lo, so let's not borrow trouble just yet. It's ok to test HAVE_LONG_LONG outside of prlong.h, too, I think, but I am a hacker. /be
Attachment #84378 - Attachment is obsolete: true
On x86 Linux, NS_SWAP64 is significantly less efficient than a crude generic approach (see the attachment). Sometimes fancy isn't better. In the long term wouldn't it be better to use system services where available. Glibc, for example, provides /usr/include/byteswap.h.
Attached file test code
Compare the generated code.
So um, what are we doing here?
Comment on attachment 84399 [details] [diff] [review] style nit-picking r=dbaron (or, rather, I'll assume r=brendan for what I did and I'm happy with brendan's changes)
Attachment #84399 - Flags: review+
Comment on attachment 84399 [details] [diff] [review] style nit-picking sr=brendan@mozilla.org, dbaron wrote it, so his r= isn't enough, but I bet Ben agrees. I'm gonna check this in tonight unless someone hollers. /be
Attachment #84399 - Flags: superreview+
Comment on attachment 84399 [details] [diff] [review] style nit-picking r=bryner
Attachment #84399 - Flags: review+
Comment on attachment 84399 [details] [diff] [review] style nit-picking Checked in 2002-06-28 22:14 PDT.
Attachment #84399 - Attachment is obsolete: true
Hm, I don't see any Txul improvement on any of the tinderboxes after this was checked in. Does something need to be clobbered on the tinderboxes, or is there another problem lurking here?
Hmm, we seem to have veered away from the original intent of this bug. The tinderbox don't build with -O2 so this bug is moot for them, no?
I nuked the mecca tree on Testerbox to see if we get an improvement there.
Attached patch fix _a la_ tenthumbs' suggestion (obsolete) — Splinter Review
Ben, bryner, can you guys test this and comment? /be
Comment on attachment 89800 [details] [diff] [review] fix _a la_ tenthumbs' suggestion We should make sure to test this on Windows given: http://lxr.mozilla.org/seamonkey/source/nsprpub/pr/include/prlong.h#79 Other than that, r=dbaron.
Attachment #89800 - Flags: review+
Thanks to Ben for testing -- Ben, can you sr=? I'll get this into trunk, and in the branch (if drivers will take it). /be
Attachment #89800 - Attachment is obsolete: true
Comment on attachment 90171 [details] [diff] [review] dbaron was right, need i64 suffix on Windows Shouldn't those be UL and ULL? (And perhaps something special for the i64?)
I don't think it matters provided x is uint64, and I was concerned about the lack of ull or u64 or whatever on all platforms. Anyone have a clue? At this point, I reckoned it was better to imitate NSPR's prlong.h. I could add (uint64) casts of (x) all over, but that seems like overkill. These macros are meant for uintN types (N in {16,32,64}). /be
Comment on attachment 90171 [details] [diff] [review] dbaron was right, need i64 suffix on Windows sr=ben@netscape.com builds & runs for me (winxp)
Attachment #90171 - Flags: superreview+
If these macros are only meant for unsigned quantities then I suggest changing their names to make that very very obvious, not to mention a comment where they're defined. If speed really is an issue then why not consider inline assembly. You can do this a lot faster on a x86 with just a few instructions. BTW, is the WIN16 stuff even necessary?
tenthumbs: I'll attach a patch that uses ull if someone will tell me what the MS syntax is (i64 probably isn't it). Failing that, I'll bury a cast in the LL__ macro and all will be well. I copied NSPR's prlong.h, WIN16 and all -- I think that's better than not copying this test from prlong.h that depends on HAVE_LONG_LONG. /be
Attached patch my final offerSplinter Review
This should do it, unless someone cares to find whether MS supports u64 or ui64 or some such. I think Ben's sr= carries over. We don't need to asm-optimize this without profile data showing a hot spot -- premature optimization is the root of all evil, etc. OTOH, tenthumbs, you did kindly point out a way to reduce the number of shifts beyond what gcc seems able to do given the NS_SWAP32-composed version of NS_SWAP64, and there's nothing premature about my taking your advice and coding the C++ slightly differently, given that I have to hack this C++ (section of an IDL) file anyway to fix the bug. It rubbed me the wrong way to ignore your initial comment and go with the SWAP32 based patch. I hope you don't mind my heeding your advice there. As for trying to use glib headers etc., I confess that I'm too lazy to bother with OS-purveyed SWAP64 macros, and I argue that the code here is too trivial to worry about sharing for optimization and future fix consolidation benefits. If anyone has profile data indicating the need for assembly or OS header usage, please share. /be
Attachment #90171 - Attachment is obsolete: true
Comment on attachment 90384 [details] [diff] [review] my final offer Ben says his sr= still applies. /be
Attachment #90384 - Flags: superreview+
Fixed on the trunk. I think this should go into the branch for posterity. Mailing drivers, /be
Keywords: mozilla1.0.1
Blizzard approved, it's in the branch too. /be
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Keywords: mozilla1.0.1fixed1.0.1
Resolution: --- → FIXED
Attachment #90384 - Flags: approval+
hmm, need help with verification of the fix.
Keywords: qawanted
Works as expected on Linux. Opening a new window no longer invalidates fast load file in an -O2 build.
varga VERIFIED. /be
Status: RESOLVED → VERIFIED
Component: XP Miscellany → XPCOM
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: