Last Comment Bug 973824 - Memory pressure watcher causes CPU spin
: Memory pressure watcher causes CPU spin
Status: RESOLVED FIXED
: qablocker, regression, smoketest
Product: Firefox OS
Classification: Client Software
Component: General (show other bugs)
: unspecified
: ARM Gonk (Firefox OS)
-- normal (vote)
: 1.4 S1 (14feb)
Assigned To: Thomas Zimmermann [:tzimmermann] [:tdz]
:
:
Mentors:
: 973940 (view as bug list)
Depends on:
Blocks: 973689 970895
  Show dependency treegraph
 
Reported: 2014-02-18 01:45 PST by Cervantes Yu [:cyu] [:cervantes]
Modified: 2014-02-18 16:52 PST (History)
15 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
1.4+
fixed


Attachments
[01] Bug 973824: Revert bug 970895 (17.99 KB, patch)
2014-02-18 05:42 PST, Thomas Zimmermann [:tzimmermann] [:tdz]
no flags Details | Diff | Splinter Review

Description User image Cervantes Yu [:cyu] [:cervantes] 2014-02-18 01:45:00 PST
We found that on today's build, GP has CPU spin on the IO threads of several processes. The CPU spin disappears when we disable GonkMemoryPressureWatcher. Adding log to MemoryPressureWatcher::CheckForMemoryPressure() and then we saw logs of log messages appear in adb logcat.

Reverting the following change makes the problem go away:
parent:      169119:6f957a000186
user:        Thomas Zimmermann <tdz@users.sourceforge.net>
date:        Mon Feb 17 12:29:26 2014 +0100
summary:     Bug 970895: Use I/O loop for polling memory-pressure events, r=dhylands

I guess GP's memory pressure device doesn't support epoll. We need GP to confirm.
Comment 1 User image Thomas Zimmermann [:tzimmermann] [:tdz] 2014-02-18 01:48:57 PST
This is probably a regression from bug 970895.
Comment 2 User image Thomas Zimmermann [:tzimmermann] [:tdz] 2014-02-18 01:51:18 PST
I'll have a look.
Comment 3 User image Cervantes Yu [:cyu] [:cervantes] 2014-02-18 01:52:53 PST
MemoryPressureWatcher was originally using poll(), but running on the IPC thread makes it use epoll(). I think this is the main difference.

This bug appears on GP only (for now, other devices not checked yet). unagi isn't found to have this CPU spin.
Comment 4 User image Thomas Zimmermann [:tzimmermann] [:tdz] 2014-02-18 01:58:51 PST
There is tons of stuff running on the I/O thread. And non of this causes any CPU spinning. Maybe the lowmem interface is different on GP. Do you use Peak or Keon?
Comment 5 User image Cervantes Yu [:cyu] [:cervantes] 2014-02-18 02:11:09 PST
It's peak.
Comment 6 User image Gabriele Svelto [:gsvelto] 2014-02-18 04:50:00 PST
(In reply to Cervantes Yu from comment #3)
> This bug appears on GP only (for now, other devices not checked yet). unagi
> isn't found to have this CPU spin.

unagis don't have the appropriate sysfs interface so we don't check for memory pressure there, I'll test on my hamachi and see if the problem reproduces there too.
Comment 7 User image Thomas Zimmermann [:tzimmermann] [:tdz] 2014-02-18 04:51:09 PST
I haven't see this problem on the Hamachi when testing bug 970895.
Comment 8 User image Thomas Zimmermann [:tzimmermann] [:tdz] 2014-02-18 04:55:59 PST
I think I've found the bug. Need to check this though.
Comment 9 User image Thomas Zimmermann [:tzimmermann] [:tdz] 2014-02-18 05:30:09 PST
I think the problem is how we poll the sysfs file. According to the commit message for poll-able sysfs files [1] the correct way is to poll for POLLPRI events. The original code did this, but the I/O thread uses POLLIN, which is always true.

I guess we need to revert bug 970895. :(


[1] https://lkml.org/lkml/2006/4/14/126
Comment 10 User image Thomas Zimmermann [:tzimmermann] [:tdz] 2014-02-18 05:42:14 PST
Created attachment 8377537 [details] [diff] [review]
[01] Bug 973824: Revert bug 970895
Comment 11 User image Thomas Zimmermann [:tzimmermann] [:tdz] 2014-02-18 05:42:48 PST
Now that I know what to look for, it is reproducible on the Hamachi.
Comment 12 User image Gabriele Svelto [:gsvelto] 2014-02-18 05:58:13 PST
(In reply to Thomas Zimmermann [:tzimmermann] [:tdz] from comment #9)
> I think the problem is how we poll the sysfs file. According to the commit
> message for poll-able sysfs files [1] the correct way is to poll for POLLPRI
> events. The original code did this, but the I/O thread uses POLLIN, which is
> always true.

Awww, yes. The node is always readable so POLLIN will always return immediately.

> I guess we need to revert bug 970895. :(

Is there no way to poll for POLLPRI in the I/O thread?
Comment 13 User image Gabriele Svelto [:gsvelto] 2014-02-18 08:21:41 PST
*** Bug 973940 has been marked as a duplicate of this bug. ***
Comment 14 User image Tony Chung [:tchung] 2014-02-18 08:32:53 PST
Flagging for qablocker, as this regression is blocking bug 973940, which is a automation and smoketest blocker.
Comment 15 User image Tony Chung [:tchung] 2014-02-18 09:23:24 PST
Editing subject, as this is not just Geeksphone related.   Testers in bug 973940 have reported this regression has occured on the Buri devices.

Build info:
Gaia      ae90f9b322509ee09fbd3963bd23e142845613ab
Gecko     https://hg.mozilla.org/mozilla-central/rev/318c0d6e24c5
BuildID   20140218040203
Version   30.0a1
ro.build.version.incremental=eng.zxliu.20130911.142924
ro.build.date=Wed Sep 11 14:29:37 CST 2013
Comment 16 User image [:fabrice] Fabrice Desré 2014-02-18 09:33:38 PST
(In reply to Thomas Zimmermann [:tzimmermann] [:tdz] from comment #10)
> Created attachment 8377537 [details] [diff] [review]
> [01] Bug 973824: Revert bug 970895

No need for reviews to land backouts
Comment 17 User image Jason Smith [:jsmith] 2014-02-18 09:48:11 PST
Going to ping a sheriff for a backout on m-c asap.
Comment 18 User image Jason Smith [:jsmith] 2014-02-18 09:49:49 PST
Clearing needinfo on geeksphone as the issue isn't geeksphone-specific.
Comment 19 User image Jason Smith [:jsmith] 2014-02-18 09:52:06 PST
Comment on attachment 8377537 [details] [diff] [review]
[01] Bug 973824: Revert bug 970895

Ed is in the process of backing this patch out on m-c, so clearing the review.
Comment 20 User image Jason Smith [:jsmith] 2014-02-18 09:54:27 PST
And...backed out.

https://hg.mozilla.org/mozilla-central/rev/eb675c2b6dee
Comment 21 User image Mike Habicher [:mikeh] (high bugzilla latency) 2014-02-18 11:59:21 PST
Looks like this is still affecting b2g-inbound.

Note You need to log in before you can comment on or make changes to this bug.