User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:22.214.171.124) Gecko/2008111318 Ubuntu/8.04 (hardy) Firefox/3.0.4 Build Identifier: 20081201 Firefox/3.1b2 Benchmark: ScheduleWorld sw2 web application (free) Tools -> Browser Test -> Speed. 15,000 iterations are done on a sampling of code that is often used when the app is running. On a Windows XP machine (a virtual machine: host=linux, guest=XP) runs 13x faster with tracemonkey enabled! On a real XP machine (not virtual) tracemonkey is the same speed or slightly slower. Summary: exact same benchmark code, exact same OS, but tracemonkey only has an effect while running in the virtual machine. It seems the thresholds for when to trace are wrong. I tried on a slower machine but perhaps it wasn't slow enough. I also tried on a slow 700MHz laptop running Linux but tracemonkey didn't kick in there either. I'm guessing there are timing thresholds in tracemonkey that are preventing it from tracing code that really should be traced. I know it should be traced because it makes a huge difference in the VM case. An amazing difference... I noted the same behaviour with 3.1b1 and minefield 3.1b2pre. Reproducible: Always Steps to Reproduce: 1.log in to http://www.scheduleworld.com/sw2/ 2.run Tools -> Browser Test -> Speed 3.JIT is enabled. Actual Results: JIT does not increase speed on a real machine. JIT dramatically increases speed on a virtual machine. Expected Results: JIT increases speed on a real machine. You can always be more memory efficient in a later release by compiling less. If it makes this problem easier to solve, please consider adjusting TM to compile more often / with decreased thresholds. People need to see how amazing this is.
Mark, I think that we're in violent agreement ;) on nearly everything you just posted. Here's some comments back, kinda long: First, and I feel this is most important-- although I'm NOT a competent coding person, and cannot "take charge" of your bug: I agree that TM's failure to improve this particular loop test is interesting, VERY interesting, and worthy of a bug all by itself. But this title and bug description, "piling on" all kinds of baggage about Linux versus Windows, and about vastly better performance while running Windows within an unidentifed hosting VM manager... I think the bug should be tightened down, focus only on the "degrades when JIT is off" issue. (Again, just my OPINION.) Second: I know, absolutely and totally for sure, that you're under-estimating the number of context switches involved here while running native Threads. In my "too much eye-candy" desktop, with no disk I/O at all, and when I let it "quiet down" by stopping my typing of this post for several seconds, I'm STILL seeing no less than 900 context switches per second. This leads to two sub-points. First subpoint, I'm willing to bet many virtual beers that I can make Linux "win" against Windows by simply by stopping Compiz-Fusion and switching from KDE to a much lighter Window Manager (e.g., e17 with GTK+ support, or ICEWM with Gnome support, etc.). And doing absolutely nothing else. Even with all my desktop overhead, it's barely 10% disadvantage-- and heck, maybe dumping Compiz alone would be enough, still keeping one of the "fat" WMs in charge (KDE or GNOME) Now that I've obtained hard numbers for Linux versus windows on identical hardware, I'd like to recommend that we also toss out Linux versus Windows as an "issue" of this bug. (again, it's YOUR bug, this is only my feeling.) Second subpoint: now that I've come back with actual counts of context switches on Linux for my too-many-layers software stack, it might be appropriate to also toss out the Windows-within-in-VM-Manager-versus-Native part of the bug. With this new bit of data, I feel even more confident in guessing that the VM implemention does dramatically better by implementing a many-to-one mapping of Windows Threads on to VM Manager native Threads. But without access to performance analysis tools for that particular VM manager, Firefox developers might not be able to get a good handle on why it's so good. (Still on second sub-point) However, I can easily imagine Firefox use of thread-like concurrency structures, both "native" and "internal/lightweight", to be implemented in a less-than optimal matter. (Gecko is still at "Version One.) Tracemonkey, within Firefox, could be suffering from excessive overhead. But I'll SWAG that to be a kinda big reseach project-- and definitely a different bug ID, even if some TM or Firefox code expert DOES raise a hand to say "I know of some easy, low risk changes which could improve this a lot". So what's left after we toss out "Windows versus Linux" completely, and move "Firefox Windows running jit.content=true inside VM manager xxx is amazingly faster than same Firefox running within Windows native" to another bug? Exactly what you just said is left-- running with jit.content=true degrades Firefox performance on this particular script, and we both wonder why this 15000x repitition loop didn't get *better*.
<quote> focus only on the "degrades when JIT is off" issue</quote> I think this is an oversimplification. <quote>"piling on" all kinds of baggage about Linux versus Windows</quote> I've never argued it was a Windows vs Linux issue. <quote> you're under-estimating the number of context switches </quote> My experience analyzing the CPU cycles lost to context switching tells me this is irrelevant for this particular test. If you have evidence that shows otherwise please post. I don't believe this is a threading issue either (syscalls vs user mode futex to guard resources etc.) because I have no evidence that the single threaded test is facing massive contention for resources. I simply don't see how it could. I usually do this analysis with tools under Linux and don't know how to do this analysis under Windows so I can't provide data. It would be fine to agree to disagree and leave this up to the TM folks. <quote>and move "Firefox Windows running jit.content=true inside VM manager xxx is amazingly faster than same Firefox running within Windows native" to another bug</quote> I've really tried to build a case for _this_ bug around this evidence: 1. in VM without JIT: speed = X 2. in VM with JIT: speed = 13X 3. no VM without JIT: speed = Y 4. no VM with JIT: speed = ~Y (why not ~13Y?) I'm simply hoping the TM folks can use this data to make TM better. I am reluctant to speculate further. I trust the TM devs to take the data for what it's worth and do the right thing. I'm willing to leave it at that. Cheers.
From what I understand here, there should be a significant tm perf gain here, requesting wanted1.9.1?
Is this still valid?
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: wanted1.9.1? → wanted1.9.2?
The 13x speedup difference is no longer reproducible. I wonder if whatever corner case was causing this has been 'fixed'? I wonder if there are too many steps in the function() that is being tested and TM is no longer tracing it. It would be really handy to me to be able to tell when this occurs for given functions. One too many steps and TM is disabled; if there was a way to test for that it wouldn't be too hard to make minor code changes to get huge performance benefits. I digress...
Looks like scheduleworld.com no longer exists. Mark, is there an alternative site to test? Otherwise, this should be closed as INCOMPLETE.
Just realized that Mark has an email at scheduleworld.com as well. Seems unlikely we're going to hear back from him.
Status: NEW → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.