Closed Bug 1215030 Opened 10 years ago Closed 9 years ago

NFS home Extreme CPU usage on OS X 10.10.5 with Firefox 41.0.1

Categories

(Firefox :: Untriaged, defect, P2)

41 Branch
x86_64
macOS
defect

Tracking

()

RESOLVED INCOMPLETE

People

(Reporter: michalm.mac, Unassigned, NeedInfo)

Details

Attachments

(8 files, 1 obsolete file)

Attached file firefox_sample.txt (obsolete) —
User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:41.0) Gecko/20100101 Firefox/41.0 Build ID: 20150929144111 Steps to reproduce: We have OS X 10.10.5 clients with Firefox 41.0.1 installed. User home directory is located on share mounted via NFSv3 protocol. (/home/users/m (nfs, asynchronous, nodev, nosuid, automounted, noatime, nobrowse)) Actual results: After starting Firefox there tend to be 100% CPU usage by Firefox process. Sometimes there is even 200% or 300% CPU Usage. This is pointing to rogue thread eating CPU core resources. I took sample of Firefox process (trace) through Activity Monitor. See attachment. Could this be related do sqlite database files which are not very fond of NFS? Expected results: Idle Firefox should not consume entire CPU core.
Severity: normal → major
OS: Unspecified → Mac OS X
Priority: -- → P2
Hardware: Unspecified → x86_64
There were some bad symbols in the report, so I resymbolicated it using https://github.com/mstange/analyze-tryserver-profiles/blob/master/resymbolicate_activitymonitorsample.py .
Attachment #8674128 - Attachment is obsolete: true
Looks like SQLite is keeping busy all the time. Any ideas for further diagnosis?
Flags: needinfo?(dteller)
(In reply to Markus Stange [:mstange] from comment #2) > Looks like SQLite is keeping busy all the time. Any ideas for further diagnosis? I am no dev. I would gladly gather more data. Just tell me how :-)
What makes you think that this is related to sqlite? As far as I know, only the parent process has access to mozStorage, not the children processes. I do see what looks like lots of time spent in `mozilla::storage::AsyncExecuteStatements::executeAndProcessStatement`, but I'm not sure this is out of the ordinary.
Flags: needinfo?(dteller) → needinfo?(michalm.mac)
(In reply to David Rajchenbach-Teller [:Yoric] (use "needinfo") from comment #4) > What makes you think that this is related to sqlite? SQlite was first guess. It is somehow related do NFS home folder. Not sure hot to debug/gather more information.
Flags: needinfo?(michalm.mac)
This is Firefox 41, so no e10s. There's hardly any idle time inside executeAndProcessStatement, it's mostly acquiring and releasing locks. Are you saying it's normal to have mozStorage running at 100% CPU over extended periods of time?
btw home folder is mounted with locallocks option since our NFSv3 server does not handle locking. From mount_nfs man page: locallocks Perform all file locking operations locally on the NFS client (in the VFS layer) instead of on the NFS server. This option can provide file locking support on an NFS file system for which the server does not support file locking. However, because the file locking is only performed on the client, the NFS server and other NFS clients will have no knowledge of the locks. Note: mounts which are both soft and read-only will also have the locallocks mount option enabled by default - unless explicitly overridden with a lock option (for example, nolocks or nolocallocks ).
Ah, my bad about the e10s stuff, I misread comment 0. Regarding `executeAndProcessStatement`, I'm not very used to reading dumps from this profiler, so I may be misinterpreting, but doesn't the 2077 mean that it's on the stack in in 2077 samples out of 2268?
(In reply to David Rajchenbach-Teller [:Yoric] (use "needinfo") from comment #8) > doesn't the 2077 mean > that it's on the stack in in 2077 samples out of 2268? Correct. Though looking at the functions that take up the remaining 191 samples, those are probably also happening inside executeAndProcessStatement and the stack unwinder just failed to unwind correctly.
Michalm, would you be willing to try narrowing down the cause via the mozregression tool? http://mozilla.github.io/mozregression/ Assuming that this used to work correctly in Firefox 40 (or another previous version), |mozregression --good-version 40 --bad-version 41| should be all you need to get started (change "40" to whatever fits the last known good version). If it's easily reproducible, tracking it down shouldn't take more than 10-15min.
Flags: needinfo?(michalm.mac)
> Assuming that this used to work correctly in Firefox 40 I forgot to mention we encountered this problem before with older versions when we were testing deployment of our new Apple classroom. It was back in april-may. I did not pay much attention to this issue back then. Heavy Firefox usage during last month showed us this problem happens quite often. –> I can not point this problem to any specific version. I could test some much older version of Firefox but that is not so useful, right?
Flags: needinfo?(michalm.mac)
Can't hurt to try, if you don't mind! If this is a regression from something that used to work, being able to track down the regressing change might at least help point in the right direction of what needs fixing.
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #12) > Can't hurt to try, Ok. I'll try this tomorow. Are there any other tools I could use to gather more information?
I'll have to defer to David and Markus on that :)
Attached file 20.0_cpu_sample.txt
Attached file 32.0.1_cpu_sample.txt
Attached file 41.0.2_cpu_sample.txt
I was able to confirm this problem in Firefox 41.0.2, 32.0.1 and 20.0.0. So far I am UNABLE to reproduce it in Firefox 10.0.0 and 3.6.9. Another observation. When problem is happening Firefox 41 freezes (beachball) on application Quit. I took samples and spindumps through Activity Monotor. See attachments.
Awesome! So we know it regressed somewhere between Fx10 and Fx20. With the mozregression tool, a command like |mozregression --good-release 10 --bad-release 20| should be able to automate further narrowing it down to a much smaller range without taking a ton of time. And no, you don't need to attach a dump for every version tested during that process :)
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #23) Ok. I'll get back to this next week.
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #23) Unfortunately all mozregression files are stored in /var/folders on LOCAL hfs+ filesystem. I need to put them inside user home folder in /home/users/something/something which is mounted via NFS. Is there a way to specify "working directory" for mozzregression? I found --profile switch in program's help but I dont't know what is meant by profile in this context. Nevertheless even with --profile ~/something I can still see Firefox files accessed in /var/folders (I used fs_usage to monitor file changes and filter out sqlite files)
I don't see a way to change where the downloaded builds are extracted and run from, no :( Julien, see the question in comment 25. Is this something mozregression supports or could be made to support?
Flags: needinfo?(j.parkouss)
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #26) > I don't see a way to change where the downloaded builds are extracted and > run from, no :( I don't need builds themselves on NFS share. I only need equivalent of ~/Library/Application Support/Firefox + other files in ~/Library (Caches, Preferences, etc.). Basically everything Firefox stores in user home folder.
(In reply to michalm.mac from comment #27) > (In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #26) > > I don't see a way to change where the downloaded builds are extracted and > > run from, no :( > I don't need builds themselves on NFS share. I only need equivalent of > ~/Library/Application Support/Firefox + other files in ~/Library (Caches, > Preferences, etc.). Basically everything Firefox stores in user home folder. So no, mozregression do not have a way to do that yet - extracting, doing things with the profile and running the build is done in the temp folder. So do you need what is present currently in your /var/folders (created by mozregression) in another folder ? I think we could add a workdir option, where at least the profile, archive extraction and run could happen. I believe that preferences are somewhere under the profile, so it should be good for that (I must say I don't know for the caches). May I ask why you need that ? I'm not against it at all - I just want to understand better the need here. :) FYI, the --profile is basically here if you want to run firefox builds against a given profile. (this can allow to define preferences for example that would be set for each tested build). You can look at the --profile-persistence also if you want fine grained control over this option.
Flags: needinfo?(j.parkouss)
(In reply to Julien Pagès (:parkouss) from comment #28) > (In reply to michalm.mac from comment #27) > May I ask why you need that ? I'm not against it at all - I just want to > understand better the need here. :) I will sumarize. I found a problem with Firefox which happens when user home folder is located on network storage mounted with NFS v3 protocol. Everything is working as expected for user with local home directory (say /Users on system volume with hfs+ filesystem). @Ryan asked me to run mozregression tool. Since mozregression stores data in temp dictory on local filesystem I am unable to replicate problem with it.
Thanks for the explanation! But if we create a working dir and put everything in there, we will have the firefox binary also in the same dir, so will that help to reproduce the bug ? or maybe we should be able to define also a "profile working dir" - to be able to have the binaries somewhere and the profile in another place. Or maybe just the "profile working dir" would be sufficient then ? Anyway, filed bug 1219823 to look into this. Eventually as a workaround (if all the data you need is under the profile) you can try to first create a profile, put it somewhere on the nfs and use the following flags: > --profile=/path/to/profile --profile-persistence=reuse The "reuse" value will force mozregression to not clone the profile before each test - it will reuse the same profile each time. It modify the given profile in place, so be sure to only do that with a profile created specifically for that. Also keep in mind that the profile is reused for all tested builds. Tell me if that helps.
(In reply to Julien Pagès (:parkouss) from comment #30) > > --profile=/path/to/profile --profile-persistence=reuse > Thank you. This option does the trick. I did more testing and problem seems to be happening the way down to version 8.0a1. Replicating it is bit random. Launch Firefox/Nightly browse a lot of web pages. Sometimes high CPU usage happens instantly other time it takes time or event multiple test with new profile. I can test even older version in upcoming weeks.
Resolved-Incomplete due to time since last communication/update by reporter. Please feel free to reopen if the error occurs in a current Firefox version.
Status: UNCONFIRMED → RESOLVED
Closed: 10 years ago
Resolution: --- → INCOMPLETE
(In reply to Michelle Funches - QA from comment #32) > Resolved-Incomplete due to time since last communication/update by reporter. > Please feel free to reopen if the error occurs in a current Firefox version. Lack of communication? I did not find out anything new since last post. Does not mean bug/deficiency is gone
Status: RESOLVED → UNCONFIRMED
Resolution: INCOMPLETE → ---
Given that we know that this bug goes back a long ways (and trying to find an earliest version that reproduces isn't likely to produce a lot of fruit), it might be worth seeing if the profiler shows anything interesting too. https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Profiling_with_the_Built-in_Profiler
Hi Michalm: Is the issue still in existence? Have you tried the Profile as listed in Comment 34?
Flags: needinfo?(michalm.mac)
Hello. I will be able to test this issue again in September (classroom deployment is scheduled for September. Until then I won't be present at site).
Michalm, have you been able to revisit this issue? Is there any information that you can pass on to us? Thanks
Marking this as Resolved: Incomplete due to the lack of response from the reporter. If anyone can still reproduce it on latest versions, feel free to reopen the issue and provide more information.
Status: UNCONFIRMED → RESOLVED
Closed: 10 years ago9 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: