Open Bug 1239376 Opened 4 years ago Updated 2 months ago

Ship Fennec Nightly with StrictMode leaks set to penaltyDeath for a few days

Categories

(Firefox for Android :: Data Providers, defect)

defect
Not set

Tracking

()

People

(Reporter: mfinkle, Unassigned, NeedInfo)

References

Details

(Keywords: leave-open)

Attachments

(2 files)

We want to get a handle on SQLite crashes [1] that might be related to leaking cursors. If we set the StrictMode to penaltyLog and penaltyDeath [2], we might be able to get crash stacks which point to leaking cursors.

We get the SQLite crashes on Nightly, so lets run with this configuration for a few days to get some code coverage for external users. If we don't see any crashes due to StrictMode leaks, we may be able to look at other issues.

[1] https://crash-stats.mozilla.com/signature/?product=FennecAndroid&signature=android.database.CursorWindowAllocationException%3A+Cursor+window+allocation+of+2048+kb+failed.+at+android.database.CursorWindow.%3Cinit%3E%28CursorWindow.java%29&version=43.0&date=%3C2016-01-13T17%3A44%3A59&date=%3E%3D2016-01-06T17%3A44%3A59

[2] http://stackoverflow.com/a/28155638
We are at the start of a release cycle should we do this?
Flags: needinfo?(mark.finkle)
(In reply to Kevin Brosnan [:kbrosnan] from comment #1)
> We are at the start of a release cycle should we do this?

I'm fine with doing this. Remember, we only want to run this trial for long enough to see what leak "crashes" we are finding. So maybe a week?

Defer to Margaret to finalize and get it landed.
Flags: needinfo?(mark.finkle) → needinfo?(margaret.leibovic)
Sure, let's get this landed and monitor what happens.

Mike or Andrew, can you land a patch for this? We should aim to get this done this week.
Flags: needinfo?(michael.l.comella)
Flags: needinfo?(margaret.leibovic)
Flags: needinfo?(ahunt)
I found penaltyDropBox [1] (rather than death) which is a less aggressive but not-feasible alternative here – I filed bug 1243471 for future use.

[1]: https://developer.android.com/reference/android/os/StrictMode.VmPolicy.Builder.html#penaltyDropBox%28%29
I see we want to enable crashing on Nightly builds but nothing is said of developer builds. I initially assumed we'd want to do this for developer builds as well but it's worth noting that if this crashes a lot (I'll test for it), it could hinder devs from developing features so it's possible we'd want to enable this in Nightly builds but not developer builds.
Assignee: nobody → michael.l.comella
Flags: needinfo?(michael.l.comella)
We apparently don't even start up without StrictMode crashing us due to:

01-27 11:40:01.720 E/AndroidRuntime(10078):     at org.mozilla.gecko.SysInfo.getMemSize(SysInfo.java:148)
01-27 11:40:01.720 E/AndroidRuntime(10078):     at org.mozilla.gecko.util.HardwareUtils.getMemSize(HardwareUtils.java:85)
01-27 11:40:01.720 E/AndroidRuntime(10078):     at org.mozilla.gecko.GeckoAppShell.isHighMemoryDevice(GeckoAppShell.java:1461)
01-27 11:40:01.720 E/AndroidRuntime(10078):     at org.mozilla.gecko.GeckoAppShell.getScreenDepth(GeckoAppShell.java:1476)
01-27 11:40:01.720 E/AndroidRuntime(10078):     at org.mozilla.gecko.BrowserApp.onCreate(BrowserApp.java:756)

Looking at a fix...
The failure here is because we only want to do 24-bit color on high-memory devices.

Computing if we're a high-memory device requires reading a 'file'… but that file is /proc/meminfo.

So I suggest one of two fixes:

1. SysInfo.java should muffle disk read strict mode warnings when reading /proc/meminfo, because *it's not a real file*. It's totally fine to do that on the main thread. (Bug 926990 for context.)

2. Don't check if we're a high-memory device before choosing 24-bit color. I don't know what the scope of this decision is -- for favicons it's likely not significant.



Note that (1) was kinda done in Bug 1070231, so read that for more context.
Opted for (1) because if it's totally fine to do, then we should remove the warning.

Next startup failure (looks like bug 1189347 may be related):

01-27 12:07:34.350 E/AndroidRuntime(26614):     at android.os.UserManager.getApplicationRestrictions(UserManager.java:611)
01-27 12:07:34.350 E/AndroidRuntime(26614):     at org.mozilla.gecko.Restrictions.isRestrictedProfile(Restrictions.java:78)
01-27 12:07:34.350 E/AndroidRuntime(26614):     at org.mozilla.gecko.Restrictions.createConfiguration(Restrictions.java:45)
01-27 12:07:34.350 E/AndroidRuntime(26614):     at org.mozilla.gecko.Restrictions.getConfiguration(Restrictions.java:31)
01-27 12:07:34.350 E/AndroidRuntime(26614):     at org.mozilla.gecko.Restrictions.update(Restrictions.java:82)
01-27 12:07:34.350 E/AndroidRuntime(26614):     at org.mozilla.gecko.GeckoApp.onResume(GeckoApp.java:2008)
01-27 12:07:34.350 E/AndroidRuntime(26614):     at org.mozilla.gecko.BrowserApp.onResume(BrowserApp.java:914)
It looks like at the end of onResume, we update our Restrictions which fails when trying to get the restrictions from the framework's UserManager, which is used to determine isRestrictedProfile.

The `update` is called on the returned Configuration but at the moment, `update` doesn't seem to do too much. In RestrictedProfileConfiguration, it just updates a cache invalid variable – maybe that operation could be delayed.

To avoid digging into this too much if Sebastian already has some context...

Sebastian, do you have a good idea about how to fix this StrictMode violation?
Flags: needinfo?(s.kaspari)
Attachment #8712800 - Attachment description: MozReview Request: Bug 1239376 - Enable StrictMode penaltyDeath on developer local & nightly builds. r=mfinkle → MozReview Request: Bug 1239376 - Fix StrictMode thread read access in SysInfo.getMemSize. r=mfinkle
Attachment #8712800 - Flags: review?(mark.finkle)
Comment on attachment 8712800 [details]
MozReview Request: Bug 1239376 - Fix StrictMode thread read access in SysInfo.getMemSize. r=mfinkle

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/32657/diff/1-2/
Comment on attachment 8712800 [details]
MozReview Request: Bug 1239376 - Fix StrictMode thread read access in SysInfo.getMemSize. r=mfinkle

https://reviewboard.mozilla.org/r/32657/#review29481

LGTM
Attachment #8712800 - Flags: review?(mark.finkle) → review+
https://reviewboard.mozilla.org/r/32663/#review29483

Do we want to penaltyDeath on all strict mode violations? Maybe we want a very specific code path for this experiment. In fact, now that I called it an experiment, why don't we use SwitchBoard? I bet we don't get the SwitchBoard configuration fast enough.

Back to my point: We probably want a very specific code path, for "nightly" only and only for resource leaks and penaltyDeath.
(In reply to Michael Comella (:mcomella) from comment #10)
> It looks like at the end of onResume, we update our Restrictions which fails
> when trying to get the restrictions from the framework's UserManager, which
> is used to determine isRestrictedProfile.
> 
> The `update` is called on the returned Configuration but at the moment,
> `update` doesn't seem to do too much. In RestrictedProfileConfiguration, it
> just updates a cache invalid variable – maybe that operation could be
> delayed.
> 
> To avoid digging into this too much if Sebastian already has some context...
> 
> Sebastian, do you have a good idea about how to fix this StrictMode
> violation?

UserManager.getApplicationRestrictions() always reads and parses an XML file from disk. We optimized this in bug 1189347 to minimize the number of disk reads to just one (or two, see below) and cache the restrictions. However as Restrictions.isAllowed() is heavily used to customize the UI we can't get this read fully off the UI thread. In bug 1189347 I also tried moving the read to a background thread but the results are needed so early on the UI thread so that the background thread never completed in time.

Because restrictions can only be updated by the device admin and because that requires leaving the app we invalidate our cache in onResume(). We could micro-optimize this here and not create a configuration in Restrictions.update() if there's none already (because this means there's also no cache to invalidate yet). But this will just delay the strict mode violation.

I guess we'll have to accept the violation here and wrap the code in Restrictions.isRestrictedProfile() like we did in RestrictedProfileConfiguration.readRestrictions().

And now that I look at it: We are reading the application restrictions twice: Once in Restrictions to determine if we are a restricted profile and a second time to create the cache in RestrictedProfileConfiguration. We should optimize this to at least do only one disk read until the cache is invalid. But this is something for a different bug. I'll file.
Flags: needinfo?(s.kaspari)
(In reply to Sebastian Kaspari (:sebastian) from comment #15)
> But this is something for a different bug. I'll file.

Bug 1244071
Unassigning in favor of bug 1263693.

Figured I should land the one StrictMode fix though.
Assignee: michael.l.comella → nobody
Keywords: leave-open

The leave-open keyword is there and there is no activity for 6 months.
:Grisha, maybe it's time to close this bug?

Flags: needinfo?(gkruglov)

The leave-open keyword is there and there is no activity for 6 months.
:Grisha, maybe it's time to close this bug?

Flags: needinfo?(gkruglov)
You need to log in before you can comment on or make changes to this bug.