Closed Bug 536334 Opened 15 years ago Closed 14 years ago

facebook chat unresponsive, lags, and causes firefox to crash or run slowly

Categories

(Firefox :: General, defect)

3.5 Branch
defect
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
blocking1.9.1 --- needed
status1.9.1 --- .9-fixed

People

(Reporter: cerijweber, Unassigned)

References

Details

(Keywords: common-issue+, regression)

Attachments

(1 obsolete file)

User-Agent:       Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6
Build Identifier: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6

Recently updated Firefox. Facebook works normally until chat opens. Chat runs slowly and doesn't keep up with my typing. This causes Firefox to momentarily freeze until it gets chat to function again (I get the rainbow pinwheel while I wait). The rest of my mac works perfectly while I wait. It also freezes when new chat windows open or when multiple people are trying to chat with me. I tried using Safari to make sure it wasn't my machine and safari runs just fine with facebook chat.

Reproducible: Always

Steps to Reproduce:
1. Log on Facebook
2. Get a chat/start a chat
3.
Actual Results:  
Chat begins to freeze and become unresponsive.

Expected Results:  
Before the update, chat worked normally
WFM with Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.3a1pre) Gecko/20091222 Minefield/3.7a1pre (.NET CLR 3.5.30729) ID:20091222050308. Please try in safe mode or create a new profile.
http://support.mozilla.com/en-US/kb/Managing+profiles
http://support.mozilla.com/en-US/kb/Safe+Mode
Are you getting only a hang (firefox stops responding), or are you also getting a crash (firefox closes without warning)? If you get a _crash_ then please give a stacktrace. https://developer.mozilla.org/En/How_to_get_a_stacktrace_for_a_bug_report
(In reply to comment #1)
> WFM with Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.3a1pre)
> Gecko/20091222 Minefield/3.7a1pre (.NET CLR 3.5.30729) ID:20091222050308.
> Please try in safe mode or create a new profile.
> http://support.mozilla.com/en-US/kb/Managing+profiles
> http://support.mozilla.com/en-US/kb/Safe+Mode
> Are you getting only a hang (firefox stops responding), or are you also getting
> a crash (firefox closes without warning)? If you get a _crash_ then please give
> a stacktrace.
> https://developer.mozilla.org/En/How_to_get_a_stacktrace_for_a_bug_report

Both have happened, but usually firefox stops responding. Normally it recovers after a few minutes. It's only crashed a couple times.
Could you get me the stacktraces for when it did crash? Also, did you test with a new profile?
There are a lot of other people complaining about the issue here:

http://support.mozilla.com/en/forum/1/528680?forumId=1&comments_threshold=0&comments_parentId=528680&comments_offset=60&comments_per_page=20&thread_style=commentStyle_plain

it seems the problem is specific of 3.5.6
That means downgrading to 3.5.5 seems to solve the problem.
Confirming based on the support forum thread.
(Although being slow and crashing are too separate things).

There are also reports in the thread from Windows users, so I'm switching the platform on this to All.

This could be branch-only.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: common-issue?
OS: Mac OS X → All
Hardware: x86 → All
Could possibly help you find the problem: I have recently installed the AdBlock Plus to my Firefox 3.5.6 and it seems to have blocked whatever code lagged the browser and locked text. You may want to look in to what adblock plus blocks on the facebook user homepage (http://www.facebook.com/home.php?ref=home) to see if you can trace which code is giving everyone the problem. Hope that helps!

Just in case you were wondering:
OS - Win. Vista SP2
FF Vers. - 3.5.6
AdBlock Plus filter list - The one labelled for the US
This was observed from firefox 3.0-3.5, however in 3.5.6 i didn't see this upto now.
Problem observed consistently from 3.6 beta5 for Windows. Utterly crippling. Forces me to restart browser. Reproducable always.
Al, Tony: can we get someone from QA looking into this, please?

Daniel: how long before the effects are felt?
Keywords: qawanted
At first glance, i've repro'd this on Fx3.5.6.  it seems to lag when the other person is typing at the same time.  Also noticed both parties dont have to be using the same facebook chat client.  I was using FB chat, and clint was using adium client for FB.  

The lag caused about 3 second delay, where i couldnt see what i was typing..  a few seconds later, my own text i was typing caught up to me.  

More investigation needed
https://bugzilla.mozilla.org/buglist.cgi?quicksearch=ALL%20status1.9.1:.6-fixed is the list of bugs we took in Firefox 3.5.6

For those experiencing the problem, did it happen immediately after upgrading to Firefox 3.5.6, or a few days after? If you download Firefox 3.5.5 from ftp.mozilla.org and install that, does the problem go away?
cc'ing some JS folks: hey guys, the symptoms here sound like a case where we're either falling off trace or somehow otherwise getting lagged on processing. Might be misreading, but any way you guys could take a look? I'm sure the QA team will help you with a test chat if needs be!
Version: unspecified → 3.5 Branch
The symptoms don't sound like tracing issues to me. My money is on GC pauses maybe? But 3s is really long, even for that. We should try to reproduce locally and then shark/profile it to see what happens.
Whiteboard: I remember this happening only after I got the upgrade. I think it was immediate--the day after I think?
Comment from reporter, placed in whiteboard
"I remember this happening only after I got the upgrade. I think it was immediate--the day after I think?"
Whiteboard: I remember this happening only after I got the upgrade. I think it was immediate--the day after I think?
Andreas: Tony can reproduce, swing by his desk tomorrow?
Ok, deal. Tony, can you grab me tomorrow. I have an interview at 10, right after maybe? If you can reproduce with a debug build or a optimized build with symbols (-g3), our chances are a lot better. My preferred platform is mac since we can use shark there, but I will make due with whatever you have otherwise.
(In reply to comment #11)
> Al, Tony: can we get someone from QA looking into this, please?
> 
> Daniel: how long before the effects are felt?

I'm getting the problem on two Dell laptops, one with 2 gigs of ram, the other with 768 megs. The 2 gig machine is less affected. Each machine experiences the FB chat lag while running 3.6b5.  Each also has 3.5.5 installed although they are never used. Neither has been upgraded to 3.5.6 at any time. Furthermore, if either machine runs Chrome and chats with the other running Firefox, the problem persists in the one running Firefox, but Chrome shows no symptoms. 

As for the amount of time before effects are felt, it's noticeable right away, but does seem to worsen over time. Then again, over time I end up with a lot of tabs and other programs open too. Still, Chrome is immune under the same conditions.

Specifics on the symptoms match those described elsewhere in this discussion, with the addition of the Facebook chat notification sound getting stuck repeating in a loop, which didn't used to happen. It seems that this problem started after Beta 5 or perhaps Beta 4. Hard to say exactly.
Additional details: If it matters, both machines run XP SP3 with all updates. One connects to the internet via the standard intel chipset, the other via Verizon's wireless broadband. Both are rather fast connections.
Daniel, does the machine swap/page a lot when the pause happens? You should hear the HDD or some sort of resource monitoring panel should confirm it.
On the machine with less memory, all programs hang and there appears to be some sporadic disk thrashing/pagefile usage. This seems to occasionally be triggered by attempting to type in the chat client. CPU usage exceeds 80 percent and I'm stuck with an hourglass cursor for 3 to 10 seconds... then the letters typed suddenly appear in the field so I can resume. Sometimes resuming causes a repeat of the symptoms.
Yeah my best guess would be is we GC too late and use a ridiculous amount of memory, making the system unusable. cc'ing dmandelin. He has been poking at similar bugs lately. I will take a look tomorrow. If we come up with a patch, it would be great if you can help testing. Thanks for the input so far.
We are able to dup this now and are currently bisecting/diagnosing.
I may have spoken too soon when I said we can repro this. We can make FF get kind of unresponsive by doing Facebook chat, but I'm not sure the detailed symptoms match. I'll describe what I can get to happen below. If anyone who has this bug can compare their symptoms to mine in detail, that would be greatly appreciated.

Here is what I see:

  - Procedure: Open up FB chat with someone, and have both of us type lots of
               messages rapidly.

  - Results: CPU usage starts at 20-40% and FF responds fine.
            
             As we keep typing messages rapidly, CPU usage rises past 80% to
             100%. This happens much more easily if both parties are typing at
             the same time. If it's just me typing, it maxes out around 70-80%.
            
             If we keep typing while CPU is at 100%, the screen starts lagging
             behind our typing. If I slow down typing, as long as lagged
             messages are still being caught up on, FF will be unresponsive.

             If we stop typing, after several seconds the lagged messages are
             finished, and then CPU usage goes back down to 20-40%.

If it is true that everyone else is seeing a variant of what I see, then I think what is happening is that FF uses a relatively high amount of CPU on the chat feature. If your machine isn't doing too much else, it will probably not use up enough CPU to cause problems. But if you are doing multiple chats, have open other FF tabs that use substantial amounts of CPU with JS or Flash, or have other apps running that are using CPU, then I can imagine the problem getting a lot more severe. (As long as you are strictly below 100% CPU usage it should be OK. But once it reaches 100%, things can start lagging and get worse and worse.)

The hypothesis above doesn't explain the fact that this problem seems to have started in FF 3.5.6. One possibility is that the bug users are seeing is not what I laid out above. Another is that the high CPU usage got slightly worse in 3.5.6 for some reason, enough to put some users over the lag threshold.
Another thing I should add is that if you can reliably reproduce this problem in Firefox 3.5.6, and 3.5.5 works fine for you, then it would be incredibly helpful if you could test mozilla-1.9.1 nightly versions in the date range 11/2/2009 through 12/1/2009 to see on what day the problem started. I can provide instructions and help for doing this.
Whiteboard: regression-window-wanted
I ran TM tip with TraceVis and found that we pretty much don't trace FB chat activity. About 40% of total time was spent in JS, the rest in non-JS activities. But note that I couldn't get CPU usage to go much over 80%, so it seems we are already doing quite a bit better on tip here than we were in 3.5.6.

Profiling TM tip with Shark, I found that a good chunk of the time was in stuff for setting .innerHTML. So, I think if the problem here is just general slowness, it's looking like it may be related to both DOM/layout and JS.

I'll repeat these measurements with a 1.9.1 tip build on Monday.
I am using 3.5.7. Re installation of Firefox might help a bit. But not completely. What I observe is that when my friend/friends starts typing, the CPU percentage for Firefox process goes up to 50% and I get this lag in typing(while I was typing this comment). But I had worse lag on Facebook chat section. But after reinstallation of Firefox(kept my customization), it become much better. hope this helps.

santhosh
One thing I observed is that the cursor is blinking its little heart out when you type. This might be some visual artifact, but it seems to me we render and re-render the chat box with a very high frequency as the user types (much higher than once per character).
I have a friend who claims that disabling jit.content solved the issue for him.  Not sure whether he's using 3.5.6 or 3.5.7.
I am using version 3.5.7 of Firefox on the following system:
Windows 7 Professional x64
Intel Core2 Quad Q8200 @2.55GHZ
4GB DDR2
Sapphire Radeon 5770

I have the following add-ons installed for firefox:
Adblock Plus 1.13
HP Smart Web Printing 4.60

When using Facebook chat, Firefox's CPU usage would go up to 25% (using up an entire core of my quad core). Typing messages in facebook chat are met with considerable lag and sometimes cause Firefox to go into a non-responsive state for a few seconds.
I found that going to about:config and changing the value for javascript.options.jit.content to "false" fixed this issue. Upon changing it to false, I tested it out by having 3 facebook chats running at once with firefox only using 1-2% CPU usage.
Definitely common on support.
Tried again today on mac 10.6, and it's only repro'ing with Fx3.5.6 and jit.content enabled.  Using the same steps i've done back in comment 12.   

I've then tried two other scenarios:
1) on Fx3.5.6, jit.content=FALSE.   Then repeating steps above.
- Result: the lagging stops and the CPU % dips under 50%

2) Switching to Fx3.5.5, and jit.content=TRUE.  Then repeating steps above.
- Result: the lagging stops and the CPU % dips under 50%

On the receiving chat end (tomcat), he's running a 1.9.1 debug build on Mac 10.6: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.1.8pre) Gecko/20100111 Firefox/3.5.8pre

Not sure if this helps, but Tomcat notes the following warnings thrown:

WARNING: Broadcasting loop!: file /work/mozilla/builds/1.9.1/mozilla/content/xul/document/src/nsXULDocument.cpp, line 1034
WARNING: refusing to set request header: file /work/mozilla/builds/1.9.1/mozilla/content/base/src/nsXMLHttpRequest.cpp, line 2910
WARNING: Broadcasting loop!: file /work/mozilla/builds/1.9.1/mozilla/content/xul/document/src/nsXULDocument.cpp, line 1034
WARNING: Broadcasting loop!: file /work/mozilla/builds/1.9.1/mozilla/content/xul/document/src/nsXULDocument.cpp, line 1034
WARNING: Broadcasting loop!: file /work/mozilla/builds/1.9.1/mozilla/content/xul/document/src/nsXULDocument.cpp, line 1034
WARNING: Broadcasting loop!: file /work/mozilla/builds/1.9.1/mozilla/content/xul/document/src/nsXULDocument.cpp, line 1034
WARNING: Broadcasting loop!: file /work/mozilla/builds/1.9.1/mozilla/content/xul/document/src/nsXULDocument.cpp, line 1034
WARNING: Broadcasting loop!: file /work/mozilla/builds/1.9.1/mozilla/content/xul/document/src/nsXULDocument.cpp, line 1034
WARNING: Broadcasting loop!: file /work/mozilla/builds/1.9.1/mozilla/content/xul/document/src/nsXULDocument.cpp, line 1034
WARNING: Broadcasting loop!: file /work/mozilla/builds/1.9.1/mozilla/content/xul/document/src/nsXULDocument.cpp, line 1034
WARNING: Broadcasting loop!: file /work/mozilla/builds/1.9.1/mozilla/content/xul/document/src/nsXULDocument.cpp, line 1034
WARNING: Broadcasting loop!: file /work/mozilla/builds/1.9.1/mozilla/content/xul/document/src/nsXULDocument.cpp, line 1034
WARNING: Broadcasting loop!: file /work/mozilla/builds/1.9.1/mozilla/content/xul/document/src/nsXULDocument.cpp, line 1034
WARNING: refusing to set request header: file /work/mozilla/builds/1.9.1/mozilla/content/base/src/nsXMLHttpRequest.cpp, line 2910
WARNING: Broadcasting loop!: file /work/mozilla/builds/1.9.1/mozilla/content/xul/document/src/nsXULDocument.cpp, line 1034
WARNING: Broadcasting loop!: file /work/mozilla/builds/1.9.1/mozilla/content/xul/document/src/nsXULDocument.cpp, line 1034
WARNING: Broadcasting loop!: file /work/mozilla/builds/1.9.1/mozilla/content/xul/document/src/nsXULDocument.cpp, line 1034
WARNING: Broadcasting loop!: file /work/mozilla/builds/1.9.1/mozilla/content/xul/document/src/nsXULDocument.cpp, line 1034
WARNING: Broadcasting loop!: file /work/mozilla/builds/1.9.1/mozilla/content/xul/document/src/nsXULDocument.cpp, line 1034 

Next steps i was going to try is finding a regression range on the nightlies between Fx3.5.5 and Fx3.5.6.   If anyone else here has time, it would be great to get some help with that.
Can anyone reproduce this on Firefox 3.6? I can't seem to ...
I can't repro the problem either on Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.2) Gecko/20100105 Firefox/3.6.   jit is enabled, and all that good stuff.
I also have this problem. Very troublesome. Please fix
Using 3.6 RC1 today and this problem was an unmitigated disaster -- similar to my earlier experience -- only worse. The worst system freeze occurred while working in another program (Word) with FF running only 6 tabs, 3 of which were Facebook pages. A friend's attempt to chat with me locked up everything for a full 30 seconds before I discovered what was going on. When I finally started hearing the much delayed notification "pop" Facebook uses, I remembered the familiar pattern was all too well. His string of attempts to contact me only worsened the problem, and after about 3 minutes I was finally able to reply, but he was already gone.

Given Facebook's extreme and growing popularity (second only to Google) I suggest using older (real world) computers with less RAM to test this properly. I doubt the public will tolerate an impaired Facebook experience, and a very large demographic still uses computers with only 1 gig of RAM. My expertise is in marketing and public relations. Give those people a bad Facebook experience and the sour taste will linger for years.
The following extra detail may or may not be related to this bug (again, using 3.6). After running FF for a few hours and it gradually slows down (as usual) to the point of requiring a restart (at anywhere from 150MB to 300MB of RAM usage), closing Firefox takes a very long time. After the GUI has left the screen and Firefox no longer shows on the task bar (I'm running XP), the execution file still appears in the Task Manager for a full minute or two longer while using about 98% of CPU cycles. During this time it very gradually releases memory over that time span until it finally leaves the list of processes. Of course, with the CPU under siege, I just watch and wait. Occasionally, the execution file gets stuck on and after a few minutes I have to just kill it. Whether this scenario would unfold the same way if I had only been running a single tab, I don't know. I never browse with just one tab, but that may also be worth considering.
Any luck on narrowing the regression range?
Brendan asked for a Plugin breakdown. Oddly enough, I disabled nearly all of these a couple of betas ago but they seem to be alive again. Hope it helps.

Installed Add-ons (all are up to date):

ENABLED Extensions
Java console 6.0.17
Java quick starter 1.0
Smoothwheel (AMO)
Test Pilot 0.4

DISABLED Extensions
Microsoft.net framework
Move media player

ENABLED Plugins
Adobe Acrobat 8.2.0.81
Google Update 1.2.183.13
iTunes application detector 1.0.1.1
Java Deployment Toolkit 6.0.170.4
Java Platform SE6 U17 6.0.170.4
Microsoft DRM (DRM Store Netscape Plugin) 9.0.0.4503
Microsoft DRM (DRM Netscape Network Object) 9.0.0.4503
Mozilla Default Plugin
Quicktime Plugin 7.6.5
Real Jukebox NS Plugin
Realplayer Version Plugin
Realplayer G2 Live-connect enabled plugin
Shockwave Flash
Silverlight Plugin
Windows Media Player Plug-in Dynamic Link Library
Windows Presentation Foundation
Yahoo Active X Plugin Bridge

DISABLED Plugins
Yahoo Application State Plugin

THEME
Snowleopardz2.0
Daniel, what flash version do you have?
Flash version is 10.0.42.34
Like I mentioned, all add-ons, plugins, etc., are up to date.
personally, i'd be more interested in knowing if it happens w/ all extensions and plugins disabled. if it doesn't, does leaving them all disabled and only enabling flash trigger it?
I actually had this problem on 3.5.7 and by setting javascript.options.jit.content to false I managed to eliminate the problem for myself, at least
This is extremely annoying. All my friends keep asking me why firefox is lagging and most have switched to chrome because of this. It started in 3.5.6 and I experience the same lag (100% CPU usage) on both OS X 10.6.2 and WinXP SP3. My windows machine actually got a BSOD and crashed. Updated 3.5.7 has done nothing to help the problem and thanks to this thread https://support.mozilla.com/en-US/forum/1/527120, I was able to find a temporary fix: setting "javascript.options.jit.content" to false.

PLEASE FIX THIS ASAP. Facebook is the number 2 site in the world! Why is the number 1 browser failing so hard?
Running 3.6 today and I disabled all plugins except flash and Mozilla default plugin. Didn't have as many tabs open as usual and chatting finally seemed to work normally on Facebook. Can't determine whether fewer tabs or disabled plugins solved the problem. If a plugin is the culprit, then any company could sabotage FF by making a common plugin that crippled the browser on a major Facebook function. Let that be a vulnerability and Firefox's days are numbered. By the end of this year, Facebook will practically BE the web. Like it or not, performance on Facebook will eventually become the 10-second litmus test by which every browser lives or dies. 

On that note, as a professional PR strategist I propose a tactic (lucky for you, for free). Tweak Gecko to perform better on Facebook than Chrome even if it requires compromising speed elsewhere a bit (Hopefully it won't, but if so... so be it).  Then, develop a "Facebook Benchmark" or "Social Media Benchmark" and start sending out press releases.  Let Google scoff and throw around other benchmarks all they want .... it won't matter. Yours will resonate with the public and acquire parallel branding in the public psyche. In other words, you'll ride the wave that matters -- the one the people are embracing hand over fist. Two-year waves like this must be ridden with gusto: http://www.alexa.com/siteinfo/facebook.com?p=tgraph&r=home_home

Now, show this to the board so you can shift more resources onto this. :)
I. Symptoms

I may have been able to duplicate this problem *once* this morning. Assuming there is only one problem, which I am beginning to doubt. Here's the bad stuff we observed:

 1. Generally high CPU usage when typing in the chat window: 70% or so. Google
    Chrome does about 30% under the same conditions. This high CPU usage seems
    to apply for most versions.

 2. Get the browser into a state where one CPU is mostly busy and typing is
    laggy. I got this on MacOS 10.5 with a 2009-10-08 nightly, just by chatting
    for a few minutes. This is the thing I could only make happen once.

 3. Get the browser into a state where one CPU is mostly busy and characters
    seemed to be dropped when typed in the chat box. This was on Vista with
    3.5.4, 3.5.5, and 3.5.6. 3.5.6 seemed to be worse but it's hard to tell for
    sure. Turning the JIT off didn't seem to have a big effect.

We also saw that things could get very bad in Vista with 6 Facebook windows open and chat messages coming in. I think that may be partly due to the design of the chat system, which seems to operate fully and independently in all tabs. So if we get CPU spikes over 50% with one, we're going to do really badly with 6 tabs, even if nothing special is going bad.

II. Issue 2: Analysis

I did 2 profiling runs with issue 2 in play. They gave similar results. The "Heavy" shark view shows:

 42.6%	42.6%	libmozjs.dylib	nanojit::Fragmento::getAnchor(void const*)	

Above this on the call stack is:

 0.0%	42.6%	libmozjs.dylib	str_replace(JSContext*, unsigned int, long*)	

The ASM view of Fragmeno::getAnchor shows all the time in the loop that appends the fragment at the end of the peer list.

The direct caller of getAnchor is GetNativeRegExp in regexp.cpp. Unlike the tracer, that function puts no limits on how many peers it will create at a given anchor. The value it passes to getAnchor is a hash of the regexp source characters and length and the regexp flags. So it seems unlikely that we would create many regexps with the same hash, but it is possible, and that would explain what happens there.

Note that Fragmento does not exist in Fx3.6, so this issue may not occur there.

III. Issue 1: Preliminary Analysis

I did a TraceVis run previously on "slow" FB chat. It showed us spending about half our time in JS, half elsewhere, with almost all the JS time in the interpreter. In a profiler, the time is very distributed, with no real hot spots; if anything, JS, event dispatching, and locking seem to show up more heavily.

That we don't trace FB chat, and we spend about half our time in JS, is sufficient to explain about a 60% greater CPU usage than Chrome. Exactly what the rest is, I'm not sure, although it seems possible it could be issues relating to event dispatching, CSS/layout (I saw a lot of CSS in my first try at profiling this a while back), or locking.

For this part, I think JaegerMonkey and Electrolysis are the real fixes.
This patch may alleviate issue 2. I tested it by artificially making the fragment lists get really long by (a) running a JS program that runs 1000 regexps, and (b) changing HashRegExp to return 1 always. Without this patch, we are very slow on that test in JIT mode. With the patch, we are faster in JIT mode than in non-JIT.
Why do we get such a long chain?
(In reply to comment #50)
> Why do we get such a long chain?

It's hard to answer that question because I was only able to repro the problem once. If we could repro, we could find out more by instrumenting the browser to print out the regexps that it jits and their hashes.
Is 6 tabs kind of brutal in other browsers as well? I'm assuming 6 tabs means 6 chat tabs in one window.
(In reply to comment #52)
> Is 6 tabs kind of brutal in other browsers as well? I'm assuming 6 tabs means 6
> chat tabs in one window.

G Chrome on my Mac; I just typed a bunch of chat junk at someone else at a moderately fast pace. (I think in general receiving chats hurts things even more, so they probably die faster under bidirectional conversation):

  FB tabs         CPU usage (%)

     1                25
     6                50
    12                70
    18                90
    24               100+ (it fell behind and got laggy here)

I have 2 cores, so it appears they're not using more than one in the default config (they use a new process for a new domain or something?).
We should also talk to FB. 25% cpu time of a machine that can execute billions of instructions a second for a chat application is ridiculous.
david: yeah, using multiple processes risks breaking 'run to completion', their solution (and hopefully the one everyone will pick) is process per domain (roughly).
nominating blocking1.9.1? flag
blocking1.9.1: --- → ?
--> needed for some 3.5.x release, for sure.

If this patch is reviewed and can be shown to help, please get it nominated for branch landing and we'll most likely approve it.

Still would like to be able to tie this to a regressing bug, though; any luck on narrowing the range further?
blocking1.9.1: ? → needed
david, can you post a tryserver build here for people to test?
Whiteboard: regression-window-wanted
(In reply to comment #58)
> Still would like to be able to tie this to a regressing bug, though; any luck
> on narrowing the range further?

The nanojit design flaw identified in comment 48 (Issue 2) was in nanojit from the beginning. It has been removed. No regression.

Comment 48 points to multiple potential problems, and of course the FB chat app was evolving. There's no single regression window or cause. The worst offender, which also happens to be the most understood bad cause here AFAICS, is the nanojit design flaw.

/be
(In reply to comment #59)
> david, can you post a tryserver build here for people to test?

The Mac build mysteriously busted, but a Windows version is available here:

http://build.mozilla.org/tryserver-builds/dmandelin@mozilla.com-try-f54e9176e0a5
I've been having a hard time getting understandable test results from the try server for my patch. I did a local Mochitest on Windows yesterday:

  1.9.1 tip              48 test failures
  + my patch             43 test failures

I don't know why there are so many failures on tip, but the results seem to show that the patch is OK. At this point, I think we can call it good after a careful review from nnethercote--it's a local change to one function only, so it should not be too hard to check on.

I'm also contacting FB to see if it makes sense to think there was a change with regexps that exposed the bug.
Attachment #423042 - Flags: review?(nnethercote)
First of all, I don't know Fragmento at all (and thankfully it's gone in 3.6 because it was awful) so Graydon might be a better reviewer.

Second, I'd really like someone experiencing the problem to confirm that it helped.

Third, I'm concerned about Daniel (comment 10 and numerous others) is experiencing this in Fx 3.6, because Fragmento doesn't exist there.

Fourth, it seems really weird that we would get the same hash repeatedly so many times.  And if getAnchor() is so slow, I would expect LookupNativeRegExp() which runs just before it to be really slow as well, because it also iterates through the peer list.

I think the patch is ok for achieving its aim of limiting the length of the peer list.  (Although the default arg and NO_LIMIT aren't necessary because jsregexp.cpp is the only caller.)  But I'm really hesitant to say that this will fix the overall problem.
I am going to give this patch a soft r- (meaning if anyone gives you an r+, I will shut up =) ). I still would like to see why we get so long chains. Just a ton of regexps? Or all hashing to the same slot? I don't think we should let this slide without understanding what happens here, even if the patch makes the problem disappear.
(In reply to comment #64)
> I am going to give this patch a soft r- (meaning if anyone gives you an r+, I
> will shut up =) ). I still would like to see why we get so long chains. Just a
> ton of regexps? Or all hashing to the same slot? I don't think we should let
> this slide without understanding what happens here, even if the patch makes the
> problem disappear.

I feel compelled at this point to protest that I have already said several times that I am trying to get info on this from FB and also plan to do some metrics to answer the questions about regexp hashing. But I don't even think that should be decisive--it's very hard to duplicate this bug, or even know that you've duplicated it.

In other news, regarding this:

> Third, I'm concerned about Daniel (comment 10 and numerous others) is
> experiencing this in Fx 3.6, because Fragmento doesn't exist there.

There are at least two failure modes under discussion in this bug; the patch I posted is intended to treat only one of them, and I have not seen any evidence that that failure mode happens in 3.6.
Comment 63 and comment 64 are gonna make me quote Voltaire (best enemy of good) and you know how I hate to do that...

We need to test the hypothesis. Let's try the patch, in the field.

/be
Further investigation revealed that the regexp issue is actually bug 502058. I rebased that patch for 1.9.1 and requested approval.
Depends on: 502058
Awesome dmandelin. That explains it. 3.6 doesn't have the 1024 bytes limitation, so it probably succeeds compiling the regexp, hence no slowdown there. Great. This makes sense now.
Attachment #423042 - Attachment is obsolete: true
Attachment #423042 - Flags: review?(nnethercote)
Approval ping: we have a patch over in bug 502058 that fixes the underlying bug and needs approval for 1.9.1.
Well, bug 502058 is the underlying bug (at least the well-defined one, and we're always working on general JS perf, which the other issue people are discussing here) and it's fixed, so I'm marking this fixed.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
marking fixed on the 1.9.1 branch to match bug 502058
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: