Closed Bug 1697255 Opened 5 years ago Closed 4 years ago

Couldn't inflate notification using media session bitmap

Categories

(Core :: Graphics, defect, P2)

Unspecified
Android
defect

Tracking

()

RESOLVED DUPLICATE of bug 1709406
Tracking Status
firefox86 --- wontfix
firefox87 + wontfix
firefox88 + wontfix
firefox89 + wontfix
firefox90 --- affected
firefox95 --- affected
firefox96 --- affected
firefox97 --- affected

People

(Reporter: royang, Assigned: aosmond, NeedInfo)

References

Details

(Whiteboard: [geckoview:m88][geckoview])

Crash Data

Attachments

(1 file)

https://sentry.prod.mozaws.net/operations/firefox-nightly/issues/6022536/?referrer=github_plugin

RemoteServiceException: Bad notification posted from package org.mozilla.fenix: Couldn't inflate contentViewsjava.lang.IllegalArgumentException: The given region must intersect with the Bitmap's dimensions.
    at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1828)
    at android.os.Handler.dispatchMessage(Handler.java:105)
    at android.os.Looper.loop(Looper.java:164)
    at android.app.ActivityThread.main(ActivityThread.java:6673)
    at java.lang.reflect.Method.invoke(Method.java)
...
(2 additional frame(s) were not displayed)

Bad notification posted from package org.mozilla.fenix: Couldn't inflate contentViewsjava.lang.IllegalArgumentException: The given region must intersect with the Bitmap's dimensions.
Whiteboard: [geckoview:m89]

This looks to be a specific case of crashes that get aggregated under [@ android.app.RemoteServiceException: at android.app.ActivityThread$H.handleMessage(ActivityThread.java)]

Blocks: 1651221
Crash Signature: [@ android.app.RemoteServiceException: at android.app.ActivityThread$H.handleMessage(ActivityThread.java)]
Severity: -- → S3
Priority: -- → P2
Severity: S3 → S2
Priority: P2 → P1
Whiteboard: [geckoview:m89] → [geckoview:m88]

GeckoView is passing through everything correctly down to gfx, but imgIContainer::GetFrameAtSize/imgIContainer::GetFrame seem to return invalid surfaces here for the URI Google is providing (example image URI).

Can someone from gfx please take a look? This is reliably crashing Fenix. There is a short-term workaround incoming on the Fenix side to mitigate it.

Component: General → Graphics
Flags: needinfo?(aosmond)
Product: GeckoView → Core
Assignee: nobody → aosmond
Status: NEW → ASSIGNED

I'm not sure if this is the problem, but something we need to fix anyways.

Flags: needinfo?(aosmond)

I should have been more specific. We get a non-null surface, but its size is 1x1. So the patch would not fix this issue for these Google images.

(In reply to Eugen Sawin [:esawin] from comment #8)

I should have been more specific. We get a non-null surface, but its size is 1x1. So the patch would not fix this issue for these Google images.

Okay, this suggests maybe the API isn't doing what you would hope for :). Imagelib's APIs cannot upscale an image, only downscale. Downscaling is only done if FLAG_HIGH_QUALITY_SCALING is given:

https://searchfox.org/mozilla-central/rev/aa9a7136835deb0eeba00c62bb50a4a0e2cdea2d/image/RasterImage.cpp#328

Rereading your code this morning, I remembered the size given in GetFrameAtSize is completely ignored:

https://searchfox.org/mozilla-central/rev/aa9a7136835deb0eeba00c62bb50a4a0e2cdea2d/widget/android/ImageDecoderSupport.cpp#78-87

And you always get the same result GetFrame would have returned.

My understanding is the desired size is:

https://github.com/mozilla-mobile/android-components/blob/09f8e790a498e4ec97173ef88d4de779fe212668/components/browser/engine-gecko-nightly/src/main/java/mozilla/components/browser/engine/gecko/mediasession/GeckoMediaSessionDelegate.kt#L17

Which as written, will never be returned unless the image itself is actually 48x48. You would need to use a DrawTarget to upscale it in native code if that is necessary; you could use FLAG_HIGH_QUALITY_SCALING to make sure it is at least not bigger than 48x48 (I imagine this is desirable, helps with the memory footprint too for the decode).

Now the next question, is 1x1 the size you expected from that image. I can't imagine we failed to decode the actual image that badly. If you can provide the raw, encoded data, I can poke around to see if it is decoded correctly, but the most likely explanation is that the server sent us a bad/wrong value but technically decodeable.

Flags: needinfo?(esawin)
Flags: needinfo?(esawin)
Whiteboard: [geckoview:m88] → [geckoview:m88][geckoview:m89]

(In reply to Andrew Osmond [:aosmond] from comment #9)

(In reply to Eugen Sawin [:esawin] from comment #8)

I should have been more specific. We get a non-null surface, but its size is 1x1. So the patch would not fix this issue for these Google images.

Okay, this suggests maybe the API isn't doing what you would hope for :). Imagelib's APIs cannot upscale an image, only downscale. Downscaling is only done if FLAG_HIGH_QUALITY_SCALING is given:

https://searchfox.org/mozilla-central/rev/aa9a7136835deb0eeba00c62bb50a4a0e2cdea2d/image/RasterImage.cpp#328

Rereading your code this morning, I remembered the size given in GetFrameAtSize is completely ignored:

https://searchfox.org/mozilla-central/rev/aa9a7136835deb0eeba00c62bb50a4a0e2cdea2d/widget/android/ImageDecoderSupport.cpp#78-87

And you always get the same result GetFrame would have returned.

My understanding is the desired size is:

https://github.com/mozilla-mobile/android-components/blob/09f8e790a498e4ec97173ef88d4de779fe212668/components/browser/engine-gecko-nightly/src/main/java/mozilla/components/browser/engine/gecko/mediasession/GeckoMediaSessionDelegate.kt#L17

Which as written, will never be returned unless the image itself is actually 48x48. You would need to use a DrawTarget to upscale it in native code if that is necessary; you could use FLAG_HIGH_QUALITY_SCALING to make sure it is at least not bigger than 48x48 (I imagine this is desirable, helps with the memory footprint too for the decode).

Thanks, that's good to know, we can file a bug on that, but that's not as critical.

Now the next question, is 1x1 the size you expected from that image. I can't imagine we failed to decode the actual image that badly. If you can provide the raw, encoded data, I can poke around to see if it is decoded correctly, but the most likely explanation is that the server sent us a bad/wrong value but technically decodeable.

I've linked to URI in comment 4 for which it is failing (it's failing for all album artwork images provided by podcasts.google.com). If you can't reproduce the issue with the decoder on that image, we'll have to check whether we're somehow failing before decoding, e.g., during the resource fetching.

The crash volume here looks extremely worrying. This would drive a Fenix dot release IMO.

Can we please land this patch soon?

Flags: needinfo?(aosmond)

(In reply to Ryan VanderMeulen [:RyanVM] from comment #12)

Can we please land this patch soon?

The patch is good, but won't fix the crash.
However, AC has merged a workaround patch to mitigate the issue by avoiding crashing.

That workaround doesn't seem to have worked, we're seeing this crash signature on fenix v87.0.0-rc.1 / AC 73.0.11 where that was merged. Fenix nightly is also still crashing.

Flags: needinfo?(esawin)

Nevermind, seems there's unrelated stacks with that signature :(

Flags: needinfo?(esawin)

Roger, can you please confirm whether 87/88 are doing better? It's hard to tell based on the crash data linked from this bug due to the generic signature.

Flags: needinfo?(royang)

(In reply to Ryan VanderMeulen [:RyanVM] from comment #16)

Roger, can you please confirm whether 87/88 are doing better? It's hard to tell based on the crash data linked from this bug due to the generic signature.

I've put in a workaround in 87/88 so it would catch this particular issue. I will remove this workaround once the fix is merged on GeckoView side.

Flags: needinfo?(royang)

The overall crash volume on this signature is still super high. Do we have other bugs/issues tracking the remaining causes?

Flags: needinfo?(kbrosnan)

(In reply to Ryan VanderMeulen [:RyanVM] from comment #18)

The overall crash volume on this signature is still super high. Do we have other bugs/issues tracking the remaining causes?

Yes, there's a long running issue with Android unhappy that we did not call startForeground in media session service. I added this change in attempt to lower it https://github.com/mozilla-mobile/android-components/issues/9807. This is a known issue and recently Android 12 have a change that will help with this crash. https://developer.android.com/about/versions/12/behavior-changes-12#foreground-service-launch-restrictions. I will continue to monitor and see if I can lower the crash rate more.

This issue shares the same signature but it is very different.

I guess what I'm struggling to understand is how we went from a baseline crash rate of ~2000 crashes/day to ~11000/day in early March, and now we're only down to ~8500/day. I'm trying to understand where those extra crashes are coming from relative to our original longstanding baseline numbers.

MediaSession API was recently updated to work with the latest changes in GeckoView. However, this is concerning as well. I'll look into this to see if my fixes have all made it to release. If so, then I will spend more time to see if there's anything else I can do to lower this crash rate. The challenge with this crash is there's no known way to reproduce it. This might be timing related.

Priority: P1 → P2
Whiteboard: [geckoview:m88][geckoview:m89] → [geckoview:m88][geckoview:m90]

clearing ownership for now, keeping in gfx-triage so we can check back.

Assignee: aosmond → nobody
Status: ASSIGNED → NEW
Flags: needinfo?(aosmond)
Flags: needinfo?(kbrosnan)
No longer blocks: gfx-triage
Whiteboard: [geckoview:m88][geckoview:m90] → [geckoview:m88][geckoview]

There's a r+ patch which didn't land and no activity in this bug for 2 weeks.
:aosmond, could you have a look please?
For more information, please visit auto_nag documentation.

Flags: needinfo?(esawin)
Flags: needinfo?(aosmond)
Assignee: nobody → aosmond
Status: NEW → ASSIGNED

This have been landed by part of bug 1709406.

Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → DUPLICATE
Flags: needinfo?(aosmond)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: