1697255 - Couldn't inflate notification using media session bitmap

Reporter

Description

•

5 years ago

https://sentry.prod.mozaws.net/operations/firefox-nightly/issues/6022536/?referrer=github_plugin

RemoteServiceException: Bad notification posted from package org.mozilla.fenix: Couldn't inflate contentViewsjava.lang.IllegalArgumentException: The given region must intersect with the Bitmap's dimensions.
    at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1828)
    at android.os.Handler.dispatchMessage(Handler.java:105)
    at android.os.Looper.loop(Looper.java:164)
    at android.app.ActivityThread.main(ActivityThread.java:6673)
    at java.lang.reflect.Method.invoke(Method.java)
...
(2 additional frame(s) were not displayed)

Bad notification posted from package org.mozilla.fenix: Couldn't inflate contentViewsjava.lang.IllegalArgumentException: The given region must intersect with the Bitmap's dimensions.

Eugen Sawin [:esawin]

Updated

•

5 years ago

Whiteboard: [geckoview:m89]

Kevin Brosnan [Ex-Mozilla]

Comment 1

•

5 years ago

This looks to be a specific case of crashes that get aggregated under [@ android.app.RemoteServiceException: at android.app.ActivityThread$H.handleMessage(ActivityThread.java)]

Blocks: 1651221

Crash Signature: [@ android.app.RemoteServiceException: at android.app.ActivityThread$H.handleMessage(ActivityThread.java)]

Roger Yang [:royang]

Reporter

Comment 2

•

5 years ago

Socorro crash report https://crash-stats.mozilla.org/report/index/5f4faf84-4359-4729-9b88-74a810210309

Emily Toop (:fluffyemily)

Updated

•

5 years ago

Severity: -- → S3

Priority: -- → P2

Eugen Sawin [:esawin]

Comment 3

•

5 years ago

STR: https://github.com/mozilla-mobile/fenix/issues/18396

Eugen Sawin [:esawin]

Updated

•

5 years ago

Severity: S3 → S2

Priority: P2 → P1

Whiteboard: [geckoview:m89] → [geckoview:m88]

Eugen Sawin [:esawin]

Comment 4

•

5 years ago

GeckoView is passing through everything correctly down to gfx, but imgIContainer::GetFrameAtSize/imgIContainer::GetFrame seem to return invalid surfaces here for the URI Google is providing (example image URI).

Can someone from gfx please take a look? This is reliably crashing Fenix. There is a short-term workaround incoming on the Fenix side to mitigate it.

Component: General → Graphics

Flags: needinfo?(aosmond)

Product: GeckoView → Core

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 5

•

5 years ago

•

Edited

https://searchfox.org/mozilla-central/rev/f6ffb71dca9eb491e85aa95042380b2602008b00/widget/android/ImageDecoderSupport.cpp#89-91

Image decoding can fail to give a surface so we need to check for a nullptr.

Blocks: gfx-triage

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 6

•

5 years ago

Attached file Bug 1697255 - Check if imagelib returned null surface in Android helper. — Details

Phabricator Automation

Updated

•

5 years ago

Assignee: nobody → aosmond

Status: NEW → ASSIGNED

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 7

•

5 years ago

I'm not sure if this is the problem, but something we need to fix anyways.

Flags: needinfo?(aosmond)

Eugen Sawin [:esawin]

Comment 8

•

5 years ago

I should have been more specific. We get a non-null surface, but its size is 1x1. So the patch would not fix this issue for these Google images.

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 9

•

5 years ago

(In reply to Eugen Sawin [:esawin] from comment #8)

I should have been more specific. We get a non-null surface, but its size is 1x1. So the patch would not fix this issue for these Google images.

Okay, this suggests maybe the API isn't doing what you would hope for :). Imagelib's APIs cannot upscale an image, only downscale. Downscaling is only done if FLAG_HIGH_QUALITY_SCALING is given:

https://searchfox.org/mozilla-central/rev/aa9a7136835deb0eeba00c62bb50a4a0e2cdea2d/image/RasterImage.cpp#328

Rereading your code this morning, I remembered the size given in GetFrameAtSize is completely ignored:

https://searchfox.org/mozilla-central/rev/aa9a7136835deb0eeba00c62bb50a4a0e2cdea2d/widget/android/ImageDecoderSupport.cpp#78-87

And you always get the same result GetFrame would have returned.

My understanding is the desired size is:

https://github.com/mozilla-mobile/android-components/blob/09f8e790a498e4ec97173ef88d4de779fe212668/components/browser/engine-gecko-nightly/src/main/java/mozilla/components/browser/engine/gecko/mediasession/GeckoMediaSessionDelegate.kt#L17

Which as written, will never be returned unless the image itself is actually 48x48. You would need to use a DrawTarget to upscale it in native code if that is necessary; you could use FLAG_HIGH_QUALITY_SCALING to make sure it is at least not bigger than 48x48 (I imagine this is desirable, helps with the memory footprint too for the decode).

Now the next question, is 1x1 the size you expected from that image. I can't imagine we failed to decode the actual image that badly. If you can provide the raw, encoded data, I can poke around to see if it is decoded correctly, but the most likely explanation is that the server sent us a bad/wrong value but technically decodeable.

Andrew Osmond [:aosmond] (he/him)

Assignee

Updated

•

5 years ago

Flags: needinfo?(esawin)

Eugen Sawin [:esawin]

Updated

•

5 years ago

Flags: needinfo?(esawin)

Whiteboard: [geckoview:m88] → [geckoview:m88][geckoview:m89]

Eugen Sawin [:esawin]

Comment 10

•

5 years ago

(In reply to Andrew Osmond [:aosmond] from comment #9)

(In reply to Eugen Sawin [:esawin] from comment #8)

I should have been more specific. We get a non-null surface, but its size is 1x1. So the patch would not fix this issue for these Google images.

Okay, this suggests maybe the API isn't doing what you would hope for :). Imagelib's APIs cannot upscale an image, only downscale. Downscaling is only done if FLAG_HIGH_QUALITY_SCALING is given:

https://searchfox.org/mozilla-central/rev/aa9a7136835deb0eeba00c62bb50a4a0e2cdea2d/image/RasterImage.cpp#328

Rereading your code this morning, I remembered the size given in GetFrameAtSize is completely ignored:

https://searchfox.org/mozilla-central/rev/aa9a7136835deb0eeba00c62bb50a4a0e2cdea2d/widget/android/ImageDecoderSupport.cpp#78-87

And you always get the same result GetFrame would have returned.

My understanding is the desired size is:

https://github.com/mozilla-mobile/android-components/blob/09f8e790a498e4ec97173ef88d4de779fe212668/components/browser/engine-gecko-nightly/src/main/java/mozilla/components/browser/engine/gecko/mediasession/GeckoMediaSessionDelegate.kt#L17

Which as written, will never be returned unless the image itself is actually 48x48. You would need to use a DrawTarget to upscale it in native code if that is necessary; you could use FLAG_HIGH_QUALITY_SCALING to make sure it is at least not bigger than 48x48 (I imagine this is desirable, helps with the memory footprint too for the decode).

Thanks, that's good to know, we can file a bug on that, but that's not as critical.

Now the next question, is 1x1 the size you expected from that image. I can't imagine we failed to decode the actual image that badly. If you can provide the raw, encoded data, I can poke around to see if it is decoded correctly, but the most likely explanation is that the server sent us a bad/wrong value but technically decodeable.

I've linked to URI in comment 4 for which it is failing (it's failing for all album artwork images provided by podcasts.google.com). If you can't reproduce the issue with the decoder on that image, we'll have to check whether we're somehow failing before decoding, e.g., during the resource fetching.

Ryan VanderMeulen [:RyanVM]

Comment 11

•

5 years ago

The crash volume here looks extremely worrying. This would drive a Fenix dot release IMO.

status-firefox86: --- → wontfix

status-firefox87: --- → affected

status-firefox88: --- → affected

status-firefox89: --- → affected

tracking-firefox87: --- → +

tracking-firefox88: --- → +

tracking-firefox89: --- → +

Ryan VanderMeulen [:RyanVM]

Comment 12

•

5 years ago

Can we please land this patch soon?

Flags: needinfo?(aosmond)

Eugen Sawin [:esawin]

Comment 13

•

5 years ago

(In reply to Ryan VanderMeulen [:RyanVM] from comment #12)

Can we please land this patch soon?

The patch is good, but won't fix the crash.
However, AC has merged a workaround patch to mitigate the issue by avoiding crashing.

Julien Cristau [:jcristau]

Comment 14

•

5 years ago

That workaround doesn't seem to have worked, we're seeing this crash signature on fenix v87.0.0-rc.1 / AC 73.0.11 where that was merged. Fenix nightly is also still crashing.

Flags: needinfo?(esawin)

Julien Cristau [:jcristau]

Comment 15

•

5 years ago

Nevermind, seems there's unrelated stacks with that signature :(

Flags: needinfo?(esawin)

Ryan VanderMeulen [:RyanVM]

Comment 16

•

5 years ago

Roger, can you please confirm whether 87/88 are doing better? It's hard to tell based on the crash data linked from this bug due to the generic signature.

Flags: needinfo?(royang)

Roger Yang [:royang]

Reporter

Comment 17

•

5 years ago

(In reply to Ryan VanderMeulen [:RyanVM] from comment #16)

Roger, can you please confirm whether 87/88 are doing better? It's hard to tell based on the crash data linked from this bug due to the generic signature.

I've put in a workaround in 87/88 so it would catch this particular issue. I will remove this workaround once the fix is merged on GeckoView side.

Flags: needinfo?(royang)

Ryan VanderMeulen [:RyanVM]

Comment 18

•

5 years ago

The overall crash volume on this signature is still super high. Do we have other bugs/issues tracking the remaining causes?

Flags: needinfo?(kbrosnan)

Roger Yang [:royang]

Reporter

Comment 19

•

5 years ago

(In reply to Ryan VanderMeulen [:RyanVM] from comment #18)

The overall crash volume on this signature is still super high. Do we have other bugs/issues tracking the remaining causes?

Yes, there's a long running issue with Android unhappy that we did not call startForeground in media session service. I added this change in attempt to lower it https://github.com/mozilla-mobile/android-components/issues/9807. This is a known issue and recently Android 12 have a change that will help with this crash. https://developer.android.com/about/versions/12/behavior-changes-12#foreground-service-launch-restrictions. I will continue to monitor and see if I can lower the crash rate more.

This issue shares the same signature but it is very different.

Ryan VanderMeulen [:RyanVM]

Comment 20

•

4 years ago

I guess what I'm struggling to understand is how we went from a baseline crash rate of ~2000 crashes/day to ~11000/day in early March, and now we're only down to ~8500/day. I'm trying to understand where those extra crashes are coming from relative to our original longstanding baseline numbers.

Roger Yang [:royang]

Reporter

Comment 21

•

4 years ago

MediaSession API was recently updated to work with the latest changes in GeckoView. However, this is concerning as well. I'll look into this to see if my fixes have all made it to release. If so, then I will spend more time to see if there's anything else I can do to lower this crash rate. The challenge with this crash is there's no known way to reproduce it. This might be timing related.

Emily Toop (:fluffyemily)

Updated

•

4 years ago

Priority: P1 → P2

Whiteboard: [geckoview:m88][geckoview:m89] → [geckoview:m88][geckoview:m90]

Jim Mathies [:jimm]

Comment 22

•

4 years ago

clearing ownership for now, keeping in gfx-triage so we can check back.

Assignee: aosmond → nobody

Status: ASSIGNED → NEW

Flags: needinfo?(aosmond)

Kevin Brosnan [Ex-Mozilla]

Updated

•

4 years ago

Flags: needinfo?(kbrosnan)

Jim Mathies [:jimm]

Updated

•

4 years ago

No longer blocks: gfx-triage

Emily Toop (:fluffyemily)

Updated

•

4 years ago

Whiteboard: [geckoview:m88][geckoview:m90] → [geckoview:m88][geckoview]

BugBot [:suhaib / :marco/ :calixte]

Comment 23

•

4 years ago

There's a r+ patch which didn't land and no activity in this bug for 2 weeks.
:aosmond, could you have a look please?
For more information, please visit auto_nag documentation.

Flags: needinfo?(esawin)

Flags: needinfo?(aosmond)

Pascal Chevrel:pascalc

Updated

•

4 years ago

status-firefox87: affected → wontfix

status-firefox88: affected → wontfix

status-firefox89: affected → wontfix

status-firefox90: --- → affected

Chris Peterson [:cpeterson]

Updated

•

4 years ago

status-firefox95: --- → affected

status-firefox96: --- → affected

status-firefox97: --- → affected

OS: All → Android

Phabricator Automation

Updated

•

4 years ago

Assignee: nobody → aosmond

Status: NEW → ASSIGNED

Makoto Kato [:m_kato]

Comment 24

•

4 years ago

This have been landed by part of bug 1709406.

Status: ASSIGNED → RESOLVED

Closed: 4 years ago

Resolution: --- → DUPLICATE

Andrew Osmond [:aosmond] (he/him)

Assignee

Updated

•

4 years ago

Flags: needinfo?(aosmond)