Couldn't inflate notification using media session bitmap
Categories
(Core :: Graphics, defect, P2)
Tracking
()
People
(Reporter: royang, Assigned: aosmond, NeedInfo)
References
Details
(Whiteboard: [geckoview:m88][geckoview])
Crash Data
Attachments
(1 file)
https://sentry.prod.mozaws.net/operations/firefox-nightly/issues/6022536/?referrer=github_plugin
RemoteServiceException: Bad notification posted from package org.mozilla.fenix: Couldn't inflate contentViewsjava.lang.IllegalArgumentException: The given region must intersect with the Bitmap's dimensions.
at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1828)
at android.os.Handler.dispatchMessage(Handler.java:105)
at android.os.Looper.loop(Looper.java:164)
at android.app.ActivityThread.main(ActivityThread.java:6673)
at java.lang.reflect.Method.invoke(Method.java)
...
(2 additional frame(s) were not displayed)
Bad notification posted from package org.mozilla.fenix: Couldn't inflate contentViewsjava.lang.IllegalArgumentException: The given region must intersect with the Bitmap's dimensions.
Updated•5 years ago
|
Comment 1•5 years ago
|
||
This looks to be a specific case of crashes that get aggregated under [@ android.app.RemoteServiceException: at android.app.ActivityThread$H.handleMessage(ActivityThread.java)]
| Reporter | ||
Comment 2•5 years ago
|
||
Socorro crash report https://crash-stats.mozilla.org/report/index/5f4faf84-4359-4729-9b88-74a810210309
Updated•5 years ago
|
Comment 3•5 years ago
|
||
Updated•5 years ago
|
Comment 4•5 years ago
|
||
GeckoView is passing through everything correctly down to gfx, but imgIContainer::GetFrameAtSize/imgIContainer::GetFrame seem to return invalid surfaces here for the URI Google is providing (example image URI).
Can someone from gfx please take a look? This is reliably crashing Fenix. There is a short-term workaround incoming on the Fenix side to mitigate it.
| Assignee | ||
Comment 5•5 years ago
•
|
||
Image decoding can fail to give a surface so we need to check for a nullptr.
| Assignee | ||
Comment 6•5 years ago
|
||
Updated•5 years ago
|
| Assignee | ||
Comment 7•5 years ago
|
||
I'm not sure if this is the problem, but something we need to fix anyways.
Comment 8•5 years ago
|
||
I should have been more specific. We get a non-null surface, but its size is 1x1. So the patch would not fix this issue for these Google images.
| Assignee | ||
Comment 9•5 years ago
|
||
(In reply to Eugen Sawin [:esawin] from comment #8)
I should have been more specific. We get a non-null surface, but its size is 1x1. So the patch would not fix this issue for these Google images.
Okay, this suggests maybe the API isn't doing what you would hope for :). Imagelib's APIs cannot upscale an image, only downscale. Downscaling is only done if FLAG_HIGH_QUALITY_SCALING is given:
Rereading your code this morning, I remembered the size given in GetFrameAtSize is completely ignored:
And you always get the same result GetFrame would have returned.
My understanding is the desired size is:
Which as written, will never be returned unless the image itself is actually 48x48. You would need to use a DrawTarget to upscale it in native code if that is necessary; you could use FLAG_HIGH_QUALITY_SCALING to make sure it is at least not bigger than 48x48 (I imagine this is desirable, helps with the memory footprint too for the decode).
Now the next question, is 1x1 the size you expected from that image. I can't imagine we failed to decode the actual image that badly. If you can provide the raw, encoded data, I can poke around to see if it is decoded correctly, but the most likely explanation is that the server sent us a bad/wrong value but technically decodeable.
| Assignee | ||
Updated•5 years ago
|
Updated•5 years ago
|
Comment 10•5 years ago
|
||
(In reply to Andrew Osmond [:aosmond] from comment #9)
(In reply to Eugen Sawin [:esawin] from comment #8)
I should have been more specific. We get a non-null surface, but its size is 1x1. So the patch would not fix this issue for these Google images.
Okay, this suggests maybe the API isn't doing what you would hope for :). Imagelib's APIs cannot upscale an image, only downscale. Downscaling is only done if
FLAG_HIGH_QUALITY_SCALINGis given:Rereading your code this morning, I remembered the size given in
GetFrameAtSizeis completely ignored:And you always get the same result
GetFramewould have returned.My understanding is the desired size is:
Which as written, will never be returned unless the image itself is actually 48x48. You would need to use a DrawTarget to upscale it in native code if that is necessary; you could use
FLAG_HIGH_QUALITY_SCALINGto make sure it is at least not bigger than 48x48 (I imagine this is desirable, helps with the memory footprint too for the decode).
Thanks, that's good to know, we can file a bug on that, but that's not as critical.
Now the next question, is 1x1 the size you expected from that image. I can't imagine we failed to decode the actual image that badly. If you can provide the raw, encoded data, I can poke around to see if it is decoded correctly, but the most likely explanation is that the server sent us a bad/wrong value but technically decodeable.
I've linked to URI in comment 4 for which it is failing (it's failing for all album artwork images provided by podcasts.google.com). If you can't reproduce the issue with the decoder on that image, we'll have to check whether we're somehow failing before decoding, e.g., during the resource fetching.
Comment 11•5 years ago
|
||
The crash volume here looks extremely worrying. This would drive a Fenix dot release IMO.
Comment 13•5 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM] from comment #12)
Can we please land this patch soon?
The patch is good, but won't fix the crash.
However, AC has merged a workaround patch to mitigate the issue by avoiding crashing.
Comment 14•5 years ago
|
||
That workaround doesn't seem to have worked, we're seeing this crash signature on fenix v87.0.0-rc.1 / AC 73.0.11 where that was merged. Fenix nightly is also still crashing.
Comment 15•5 years ago
|
||
Nevermind, seems there's unrelated stacks with that signature :(
Comment 16•5 years ago
|
||
Roger, can you please confirm whether 87/88 are doing better? It's hard to tell based on the crash data linked from this bug due to the generic signature.
| Reporter | ||
Comment 17•5 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM] from comment #16)
Roger, can you please confirm whether 87/88 are doing better? It's hard to tell based on the crash data linked from this bug due to the generic signature.
I've put in a workaround in 87/88 so it would catch this particular issue. I will remove this workaround once the fix is merged on GeckoView side.
Comment 18•5 years ago
|
||
The overall crash volume on this signature is still super high. Do we have other bugs/issues tracking the remaining causes?
| Reporter | ||
Comment 19•5 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM] from comment #18)
The overall crash volume on this signature is still super high. Do we have other bugs/issues tracking the remaining causes?
Yes, there's a long running issue with Android unhappy that we did not call startForeground in media session service. I added this change in attempt to lower it https://github.com/mozilla-mobile/android-components/issues/9807. This is a known issue and recently Android 12 have a change that will help with this crash. https://developer.android.com/about/versions/12/behavior-changes-12#foreground-service-launch-restrictions. I will continue to monitor and see if I can lower the crash rate more.
This issue shares the same signature but it is very different.
Comment 20•4 years ago
|
||
I guess what I'm struggling to understand is how we went from a baseline crash rate of ~2000 crashes/day to ~11000/day in early March, and now we're only down to ~8500/day. I'm trying to understand where those extra crashes are coming from relative to our original longstanding baseline numbers.
| Reporter | ||
Comment 21•4 years ago
|
||
MediaSession API was recently updated to work with the latest changes in GeckoView. However, this is concerning as well. I'll look into this to see if my fixes have all made it to release. If so, then I will spend more time to see if there's anything else I can do to lower this crash rate. The challenge with this crash is there's no known way to reproduce it. This might be timing related.
Updated•4 years ago
|
Comment 22•4 years ago
|
||
clearing ownership for now, keeping in gfx-triage so we can check back.
Updated•4 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
Comment 23•4 years ago
|
||
There's a r+ patch which didn't land and no activity in this bug for 2 weeks.
:aosmond, could you have a look please?
For more information, please visit auto_nag documentation.
Updated•4 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
Comment 24•4 years ago
|
||
This have been landed by part of bug 1709406.
| Assignee | ||
Updated•4 years ago
|
Description
•