Closed Bug 842820 Opened 11 years ago Closed 10 years ago

3rd party app WP-Photos experiences an abnormal process termination during upload

Categories

(Firefox OS Graveyard :: General, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(b2g18+ affected)

RESOLVED FIXED
Tracking Status
b2g18 + affected

People

(Reporter: adora, Unassigned)

Details

(Whiteboard: [apps watch list][MemShrink:P2])

During testing, at first only the app itself crashed and could be relaunched, but now the launcher occasionally disappears, forcing me to reboot the device.

Steps to duplicate:
1) Log in with the following info:
Username: mozappreview
Password: reviewmozapp
Blog:  http://moztestblog.wordpress.com

2) Tap the + button in the upper right hand corner to upload a photo.
3) Select Camera, and take a picture.  OR:  Select Gallery and choose a picture that was taken with the camera.
4) Tap the blue Save button in the upper right corner
5) Tap OK to publish now

Observe: after 1-10 seconds, app will crash.

App does NOT crash when selecting a screenshot from the Gallery. Maybe this is due to a file size issue? 

The developer is not able to duplicate this issue on his device.

Log:  https://gist.github.com/lisabrewster/3ab11355f6ca2850cae5
Link to app: https://marketplace.firefox.com/app/wp-photos/
Talked with Lisa about this in IRC.

She's experienced an OS restart crash twice so far with this STR, but that piece isn't consistently reproducible. The app-only crash however is consistently reproducible.

Noming to block as we can't have any OS restart crashes.
blocking-b2g: --- → tef?
Component: Gaia → General
Keywords: crash, reproducible
(In reply to Jason Smith [:jsmith] from comment #1)
> Talked with Lisa about this in IRC.
> 
> She's experienced an OS restart crash twice so far with this STR, but that
> piece isn't consistently reproducible. The app-only crash however is
> consistently reproducible.
> 
> Noming to block as we can't have any OS restart crashes.

Oh wait, disregard. This isn't a OS restart crash. I'll hold on noming until I get more info.
blocking-b2g: tef? → ---
I'm going to grab a stack on this.
QA Contact: jsmith
I can reproduce - looks like the process adnormally getting terminated.
Logcat only shows:

02-19 16:41:42.185: I/Gecko(108): ###!!! [Parent][AsyncChannel] Error: Channel error: cannot send/recv
Summary: 3rd party app WP-Photos crashes during upload → 3rd party app WP-Photos experiences an abnormal process termination during upload
dhylands mentions that we're likely seeing a OOM here. The logcat message means that the parent process is trying to communicate with a dead child process.
Based on the current analysis, let's track this. Although I have zero clue who would be a good candidate to work on this.
tracking-b2g18: --- → ?
Confirmed this is a OOM.


$ adb shell dmesg | tail -20
<4>[02-20 00:49:54.566] [108: b2g][LY] send_bkl_address
<4>[02-20 00:49:54.566] [108: b2g][LY] send_bkl_data
<4>[02-20 00:49:54.586] [108: b2g][ZYF] lcdc_set_bl level=16, 1
<4>[02-20 00:49:54.586] [108: b2g][LY] send_bkl_address
<4>[02-20 00:49:54.586] [108: b2g][LY] send_bkl_data
<6>[02-20 00:49:54.796] [audpp.c:audpp_dsp_event] DISABLE
<6>[02-20 00:49:54.796] [222: AudioOut_1][audpp.c:audpp_disable] Received CFG_MSG_DISABLE from ADSP
<6>[02-20 00:49:54.796] [222: AudioOut_1][adsp.c:msm_adsp_disable] disable 'AUDPPTASK'
<6>[02-20 00:49:54.796] [222: AudioOut_1][adsp.c:msm_adsp_put] closing module AUDPPTASK
<6>[02-20 00:49:54.796] [222: AudioOut_1][audmgr.c:audmgr_disable] session 0xc080a520
<6>[02-20 00:49:54.796] [578: audmgr_rpc][audmgr.c:audmgr_rpc_thread] rpc_reply status 0
<3>[02-20 00:49:54.806] [578: audmgr_rpc][audmgr.c:process_audmgr_callback] DISABLED
<4>[02-20 00:50:05.276] [28: kswapd0]select 1298 ((Preallocated a), adj 6, size 3165, to kill
<4>[02-20 00:50:05.276] [28: kswapd0]send sigkill to 1298 ((Preallocated a), adj 6, size 3165
<4>[02-20 00:50:06.097] [226: Gecko_IOThread]select 912 (Homescreen), adj 4, size 3753, to kill
<4>[02-20 00:50:06.097] [226: Gecko_IOThread]send sigkill to 912 (Homescreen), adj 4, size 3753
<4>[02-20 00:50:08.129] [28: kswapd0]select 451 (Browser), adj 2, size 2646, to kill
<4>[02-20 00:50:08.129] [28: kswapd0]select 460 (Browser), adj 2, size 3608, to kill
<4>[02-20 00:50:08.129] [28: kswapd0]select 947 (WP Photos), adj 2, size 20578, to kill
<4>[02-20 00:50:08.129] [28: kswapd0]send sigkill to 947 (WP Photos), adj 2, size 20578
I think it be helpful to make the OOM-process-kill messsages more explicit in logcat.  (I'm not immediately sure if this is possible.)
Hmm...I'm not sure who's a good person to ask to investigate this. Chris do you have any ideas?
Whiteboard: [apps watch list]
When I tried - in my Firefox OS Boilerplate App - to show previews of pictures from the pictures folder, the app crashes as well (about 20 pictures in the folder). I can use deviceStorage and iterate over the files, show their names, size etc, but adding them to images in the document brings the app down.

Could this be the same issue?

I'd be happy to put together such a test version of the app if you want to test it.
Andrew - Do you any ideas on who would be a good person to look into this? I don't have a clue who would work on this one.
Flags: needinfo?(overholt)
I'm stretching here but *maybe* Justin can help with the OOM messages (see comment 9) and *maybe* Fabrice or Mounir know something since it appears to be happening with the photo activity.
Flags: needinfo?(overholt)
> <4>[02-20 00:50:08.129] [28: kswapd0]select 451 (Browser), adj 2, size 2646, to kill
> <4>[02-20 00:50:08.129] [28: kswapd0]select 460 (Browser), adj 2, size 3608, to kill
> <4>[02-20 00:50:08.129] [28: kswapd0]select 947 (WP Photos), adj 2, size 20578, to kill
> <4>[02-20 00:50:08.129] [28: kswapd0]send sigkill to 947 (WP Photos), adj 2, size 20578

I'm not sure what you need help with, but I can at least interpret this.

We have three processes all with fg priority: Two browser processes plus the WP Photos process.  The WP Photos process is by far the largest (I'd guess that the sizes there are in pages because that puts the browser processes at ~10-15mb apiece, so the WP Photos process is probably at 20578 * 4kb = 80mb), so it gets killed.

I don't know why we have two browser processes both in the fg, but if WP Photos is actually using 80mb of memory, that's kind of irrelevant.
Minusing for tracking since this seems like an issue on the app's side, not ours.  Renom if that changes.
Lisa - Can you followup with the app developer here? The conclusion on discussions thus far indicates this is a bug in the app using way too much memory, not the phone.
Flags: needinfo?(adora)
We don't feel that the problem Robert describes in comment #11 is related or the same? Boilerplate is very simple. 

The Wordpress app worked on an earlier version of the OS. It just seems funny that it would stop working on later versions without changes to the app.
(In reply to Mark Coggins from comment #17)
> We don't feel that the problem Robert describes in comment #11 is related or
> the same? Boilerplate is very simple. 
> 
> The Wordpress app worked on an earlier version of the OS. It just seems
> funny that it would stop working on later versions without changes to the
> app.

In that case, that would imply a regression then (which I don't think anyone knew about). Which means this might actually be appropriate to track, then.
Flags: needinfo?(adora)
(In reply to Robert Nyman from comment #11)

> I'd be happy to put together such a test version of the app if you want to
> test it.

Yes, please do put together the most minimal test app you can and then Jason can use that to look for the regression window.
Flags: needinfo?(robert)
(In reply to Robert Nyman from comment #11)
> I'd be happy to put together such a test version of the app if you want to
> test it.
Hope its okay to chime in. In a vein similar to Robert's comment, I'd be happy to whip up a simpler version of the WPPhotos app focusing just on the image picking and upload flow if it would be helpful. 

One other observation fwiw: based on the log file Lisa provided in her review, the crash appears to occur during the XMLHttpRequest to upload the selected image.

Still haven't been able to reproduce the crash myself :(
(In reply to Lukas Blakk [:lsblakk] from comment #19)

> Yes, please do put together the most minimal test app you can and then Jason
> can use that to look for the regression window.

I put together a small test case, available at http://robertnyman.com/Firefox-OS/Firefox-OS-Boilerplate-App.zip. When I comment out the line showing the img:

p.appendChild(img)

alternatively 

filePresentation += "<p><img src='" + window.URL.createObjectURL(file) + "' alt=''></p>") 

it works fine.

Then I get a list of 32 images from the pictures folder, with name and size (most of them are at about 100kb on my device, none much bigger).

But if I enable showing an image, the app crashes after about 5-10 images.
Flags: needinfo?(robert)
Robert, although the files may only be 100KiB, each one decompresses to 1600x1200x4-bytes, or ~7.3MiB.  Trying to load 32 of them requires >234MiB, which is more than all of the available memory on (e.g.) Unagi; 5 to 10 of them (36 to 73MiB) is already pushing it.  Hence the abnormal termination.  dmesg output on the DUT will confirm the low-memory kill.

Rather than operating on full-size images, I believe the Gallery app extracts the thumbnails from the EXIF headers in the images and uses those as thumbnails as they are much smaller--but care still needs to be taken not to try to load too many all at once.
(In reply to Mike Habicher [:mikeh] from comment #22)
> Robert, although the files may only be 100KiB, each one decompresses to
> 1600x1200x4-bytes, or ~7.3MiB.  Trying to load 32 of them requires >234MiB,
> which is more than all of the available memory on (e.g.) Unagi; 5 to 10 of
> them (36 to 73MiB) is already pushing it.  Hence the abnormal termination. 
> dmesg output on the DUT will confirm the low-memory kill.
> 
> Rather than operating on full-size images, I believe the Gallery app
> extracts the thumbnails from the EXIF headers in the images and uses those
> as thumbnails as they are much smaller--but care still needs to be taken not
> to try to load too many all at once.

That's very interesting! In my mind, though, I'd probably expect the browser (app rendering engine) to rather scale them down on its own or stop rendering when it hits a memory limit, not take down the entire app.
Don't know when I'll get to this. Unassigning to allow someone to steal.
QA Contact: jsmith
Based on discussions on another bug with Pinterest, my hunch is this another image decoding issue eating up lots of memory, so the process gets killed. I don't think this is a regression per say.

Justin - Thoughts?
Flags: needinfo?(justin.lebar+bug)
Whiteboard: [apps watch list] → [apps watch list][MemShrink]
Sorry, could you be more specific as to what the question is?
Flags: needinfo?(justin.lebar+bug)
(In reply to Justin Lebar [:jlebar] from comment #26)
> Sorry, could you be more specific as to what the question is?

https://marketplace.firefox.com/reviewers/apps/review/wp-photos is having a OOM, probably due to some image decoding workflow used by the packaged app.

Do we know why the memory is spiking in the case where the camera is used in the web activity by this app that leads to an OOM?

I'll probably need to debug this IRL, as this requires a reviewer access to install the app.
(In reply to Jason Smith [:jsmith] from comment #27)
> (In reply to Justin Lebar [:jlebar] from comment #26)
> > Sorry, could you be more specific as to what the question is?
> 
> https://marketplace.firefox.com/reviewers/apps/review/wp-photos is having a
> OOM, probably due to some image decoding workflow used by the packaged app.
> 
> Do we know why the memory is spiking in the case where the camera is used in
> the web activity by this app that leads to an OOM?
> 
> I'll probably need to debug this IRL, as this requires a reviewer access to
> install the app.

Meant to say "we'll probably need to debug this IRL"
Actually, it's possible to test this on your own device if you push the WP-Photos app to your device using the FF OS simulator.
> Do we know why the memory is spiking in the case where the camera is used in the web 
> activity by this app that leads to an OOM?

No, there's not enough detail here to say what could be going on.  But this sort of thing may be easy to debug if you have a 512mb device (or a kernel hacked for such).  Just do the bad thing and then dump about:memory.  It's at least possible that the about:memory dump will tell you what we want to know.

Note that there were some bad camera leaks fixed only recently.
Whiteboard: [apps watch list][MemShrink] → [apps watch list][MemShrink:P2]
Whiteboard: [apps watch list][MemShrink:P2] → [apps watch list][MemShrink]
I presume the P2 was cleared by accident here.
Whiteboard: [apps watch list][MemShrink] → [apps watch list][MemShrink:P2]
I took another look at this app on v1.0.1 2013-06-12, and am still experiencing the same issue.  Any other updates?
To upload a photo to WordPress the image needs to be base64 encoded. It could be that the encoding process is causing the OOM. Are other apps able to upload large files?
(In reply to dan from comment #33)
> To upload a photo to WordPress the image needs to be base64 encoded. It
> could be that the encoding process is causing the OOM. 

IIRC, when I reviewed the log from Lisa the crash would be occurring after an XMLHTTPRequest was opened, at which point the image would have already been base64 encoded. Keeping a base64 encoded image string in memory may be contributing to the OOM error, but my suspicion has been the root cause has something to do with sending the XHR.

I'm still unable to duplicate the crash on my dev phone :(
Just attempted to reproduce with the STR in comment #0.  
on both 1.0.1 and 1.1:
The app didn't crash for me but the photo upload failed midway each time I tried and I was given an option to re-upload (with the same result)
Hey everyone - following up on this bug again. The developer uploaded a new build at the end of October a short while ago.

From my device w/1.1 (ZTE/Inari) I'm able to log in using the credentials provided above. Photos upload fine, app functions as you would expect.
Lisa - Should this bug still be open?  and if so does this need to be referred back to the OS team?
Flags: needinfo?(adora)
App was approved on November 12, 2013.
Status: NEW → RESOLVED
Closed: 10 years ago
Flags: needinfo?(adora)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.