1063658 - Work around memory-backed Blob lifetime issues when passed between processes by Web Activities by saving them to DeviceStorage-backed temporary files in .gallery/tmp and returning those instead.

Reporter

Description

•

10 years ago

Attached file flame-v2.1-email-attach-death.txt — Details

On bug 1062262 we're seeing the following phenomenon on FxOS v2.0 through trunk that's very reminiscent of bug 982779 which was supposed to fix this phenomenon (and landed on v2.0):

- Email app triggers gallery app via web activity.  Gallery app was not previously running, so a process is created.
- User selects an image that needs to be rotated per EXIF.  Gallery app rotates it and creates a memory-backed Blob.  User hits done, the memory-backed Blob is returned via the activity.
- The email app gets the Blob and starts doing its game where it slices up the Blob into chunks and uses XHR to fetch those chunks sequentially.
- The email app dies!  Well, sometimes it doesn't die and instead the email app's attaching process hangs.


From :djf and the bug, my understanding is that we haven't been hearing more about this because all FxOS devices except Tarako have the cameras configured so that we do not need to perform dynamic rotation.  And in general Camera/Gallery are using disk-backed Blobs.  Also, it seems like this depends on the gallery app's process getting killed; if the gallery app is open in the background for other reasons, things don't die horribly, etc.


Here's an excerpt from the v2.1 logcat I'm attaching.  PID-wise, 3884 is email, 4036 is the gallery process, 1429 is the parent process.

09-05 13:08:36.896 I/GeckoDump( 3884): LOG: compose: attach activity success: attachment1 [details] [diff] [review].jpg
09-05 13:08:37.086 I/Gecko   ( 3884): WLOG: attachBlobToDraft: fetching 0 to 524286 of 1923917
09-05 13:08:37.186 I/Gecko   ( 3884): WLOG: attachBlobToDraft: fetched
09-05 13:08:37.206 I/Gecko   ( 4036): ###!!! [Child][MessageChannel] Error: Channel closing: too late to send/recv, messages will be lost
09-05 13:08:37.226 I/Gecko   ( 1429): ###!!! [Parent][MessageChannel] Error: Channel error: cannot send/recv
09-05 13:08:37.226 I/Gecko   ( 1429): ###!!! [Child][OnMaybeDequeueOne] Error: Channel error: cannot send/recv
 * that last one repeated a lot *
09-05 13:08:37.226 I/Gecko   ( 1429): ###!!! [Child][MessageChannel] Error: Channel error: cannot send/recv
09-05 13:08:37.226 I/Gecko   ( 1429): [Parent 1429] WARNING: FileDescriptorSet destroyed with unconsumed descriptors: file ../../../gecko/ipc/chromium/src/chrome/common/file_descriptor_set_posix.cc, line 20
 * that last one repeated a lot *
09-05 13:08:37.276 I/Gecko   ( 3884): WLOG: attachBlobToDraft: flushing
09-05 13:08:37.416 I/Gecko   ( 3884): WLOG: attachBlobToDraft: fetching 524286 to 1048572 of 1923917
09-05 13:08:37.426 I/Gecko   ( 1429): ###!!! [Parent][DispatchAsyncMessage] Error: (msgtype=0x1A0003,name=PContent::Msg_PBlobConstructor) Value error: message was deserialized, but contained an illegal value
09-05 13:08:37.426 I/Gecko   ( 1429): ###!!! [Parent][DispatchAsyncMessage] Error: (msgtype=0x60003,name=???) Route error: message sent to unknown actor ID
09-05 13:08:37.426 I/Gecko   ( 1429): [Parent 1429] WARNING: waitpid failed pid:3884 errno:10: file ../../../gecko/ipc/chromium/src/base/process_util_posix.cc, line 261
 * asuth thinks everything else is boring because we never hear from PID 3884 or 4036 again *


A 2.0 log excerpt is much less chatty and looks more like (3769=email, 5039=gallery 619=parent), although the chattiness could be random:

I/Gecko   ( 3769): WLOG: attachBlobToDraft: fetching 0 to 524286 of 1923917
I/Gecko   ( 5039): ###!!! [Child][MessageChannel] Error: Channel closing: too late to send/recv, messages will be lost
I/Gecko   ( 3769): WLOG: attachBlobToDraft: fetched
I/Gecko   (  619): [Parent 619] WARNING: pipe error (238): Connection reset by peer: file ../../../gecko/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 452
I/Gecko   ( 3769): WLOG: attachBlobToDraft: fetching 524286 to 1048572 of 1923917
I/Gecko   (  619): ###!!! [Parent][DispatchAsyncMessage] Error: Value error: message was deserialized, but contained an illegal value
I/Gecko   (  619): [Parent 619] WARNING: waitpid failed pid:3769 errno:10: file ../../../gecko/ipc/chromium/src/base/process_util_posix.cc, line 261
I/Gecko   (  619): ###!!! [Parent][MessageChannel] Error: Channel error: cannot send/recv

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 1

•

10 years ago

any chance you can grab a log with MOZ_IPC_MESSAGE_LOG=1 ?

Andrew Sutherland [:asuth] (he/him)

Reporter

Comment 2

•

10 years ago

(In reply to ben turner [:bent] (use the needinfo? flag!) from comment #1)
> any chance you can grab a log with MOZ_IPC_MESSAGE_LOG=1 ?

Sure.  https://wiki.mozilla.org/IPDL/Five_minute_example and the fact that my first attempt didn't work seems to imply that I need a DEBUG build, so spinning one of those.

Flags: needinfo?(bugmail)

Andrew Sutherland [:asuth] (he/him)

Reporter

Comment 3

•

10 years ago

Attached file flame-kk v2.2 trunk build with MOZ_IPC_MESSAGE_LOG=1 grep-filtered — Details

This is potentially the less boring log to look at. I grepped it down but will also include the full log next. Key facts:

- PIDs: root=8244, email=8646, gallery=8703

- I configured/built by using B2G_DEBUG=1 in my environment and putting it in the b2g .config file. The resulting build is somewhat slow and sucky on the Flame. This impacts things.

- The email app did not crash in this case, but this appears to be a timing thing. The gallery app is dead before the email app manages to even start processing the Blob and our IndexedDB transaction manages to die somehow even. I am assuming all the mystery blob stuff may still be useful.

- I bounded the pre-grepped log to start at the first mention of the gallery process being triggered. Its first line lines up with the last line of this excerpt here:
"""
09-05 22:49:37.061 I/GeckoDump( 8646): LOG: compose: attach: triggering web activity
09-05 22:49:38.871 I/Gecko ( 8244): [Parent 8244] ###!!! ASSERTION: old rule tree still referenced: 'Not Reached', file ../../../gecko/layout/style/nsStyleSet.cpp, line 1969
09-05 22:49:38.871 I/Gecko ( 8244): [Parent 8244] ###!!! ASSERTION: old rule tree still referenced: 'Not Reached', file ../../../gecko/layout/style/nsStyleSet.cpp, line 1969
09-05 22:49:39.181 I/GeckoDump( 8244): XXX FIXME : Got a mozContentEvent: activity-choice
"""

- The log excerpt lines up with when things got boring here:
09-05 22:50:11.391 I/Gecko ( 8244): [Parent 8244] ###!!! ASSERTION: old rule tree still referenced: 'Not Reached', file ../../../gecko/layout/style/nsStyleSet.cpp, line 1969

- The filter I used is:
grep -i --regexp blob --regexp LOG --regexp ERR --regexp WARN --regexp IndexedDB --regexp activity flame-kk-trunk-blob-sadness-MOZ_IPC_MESSAGE_LOG\=1.txt > flame-kk-trunk-blob-sadness-filtered.txt

Flags: needinfo?(bugmail)

Andrew Sutherland [:asuth] (he/him)

Reporter

Comment 4

•

10 years ago

Attached file flame-kk-trunk-blob-sadness-MOZ_IPC_MESSAGE_LOG=1.txt FULL — Details

Andrew Sutherland [:asuth] (he/him)

Reporter

Comment 5

•

10 years ago

uh, and for posterity, since I wasn't actually quite sure how to get the log to happen, this is what I ran from my linux command line to get my logs after first doing "adb shell stop b2g".  If you are lazy and don't provide all the environment variables, then web activities won't work and you'll have a bad time:

adb shell MOZ_IPC_MESSAGE_LOG=1 GRE_HOME=/system/b2g LD_PRELOAD=/system/b2g/libmozglue.so LD_LIBRARY_PATH=/vendor/lib:/system/lib:/system/b2g /system/b2g/b2g &> /path/to/logfile

Andrew Sutherland [:asuth] (he/him)

Reporter

Comment 6

•

10 years ago

Attached file flame-kk-trunk-blob-reproduced-filtered.txt — Details

I worked around whatever that IndexedDB problem is by using an existing draft.  (Draft saving worked fine if I just did "start b2g", but my MOZ_IPC_MESSAGE_LOG=1 usage would break it.)

- PIDs: root=10699, email=11134, gallery=11169

Not much extra happens in the log.  Mainly this:

WLOG: attachBlobToDraft: fetching 0 to 524286 of 1923917
[time:1409974229582548][11134->10699][PContentChild] Sending Msg_PBlobConstructor([TODO])
[time:1409974229586959][11134->10699][PBlobChild] Sending Msg_PBlobStreamConstructor([TODO])
[time:1409974229594530][10699<-11134][PIndexedDBTransactionParent] Received Msg_AllRequestsFinished([TODO])
[time:1409974229600288][10699<-11134][PContentParent] Received Msg_PBlobConstructor([TODO])
[Parent 10699] WARNING: NS_ENSURE_SUCCESS(rv, nullptr) failed with result 0x8000FFFF: file ../../../gecko/dom/ipc/Blob.cpp, line 1876
[time:1409974229619743][10699->0][PIndexedDBTransactionParent] Sending Msg_Complete([TODO])
[Parent 10699] WARNING: pipe error (192): Connection reset by peer: file ../../../gecko/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 451
[Parent 10699] WARNING: pipe error: Broken pipe: file ../../../gecko/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 728
[Parent 10699] WARNING: pipe error: Broken pipe: file ../../../gecko/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 728
[Parent 10699] WARNING: pipe error: Broken pipe: file ../../../gecko/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 728

Andrew Sutherland [:asuth] (he/him)

Reporter

Comment 7

•

10 years ago

Attached file flame-kk-trunk-blob-reproduced-full.txt — Details

There's some stuff in the logcat that doesn't seem to have come out in the log too:

09-05 23:30:28.930 I/Gecko   (11134): WLOG: attachBlobToDraft: retrieving message
09-05 23:30:29.180 E/OomLogger(10699): [Kill]: lowmem_shrink: convert oom_adj to oom_score_adj:
09-05 23:30:29.300 E/OomLogger(10699): [Kill]: lowmem_shrink: convert oom_adj to oom_score_adj:
09-05 23:30:29.390 E/OomLogger(10699): [Kill]: lowmem_shrink: convert oom_adj to oom_score_adj:
09-05 23:30:29.570 I/Gecko   (11134): WLOG: attachBlobToDraft: fetching 0 to 524286 of 1923917
09-05 23:30:29.590 I/Gecko   (10699): [Parent 10699] WARNING: NS_ENSURE_SUCCESS(rv, nullptr) failed with result 0x8000FFFF: file ../../../gecko/dom/ipc/Blob.cpp, line 1876
09-05 23:30:29.590 I/Gecko   (10699): 
09-05 23:30:29.590 I/Gecko   (10699): ###!!! [Parent][DispatchAsyncMessage] Error: (msgtype=0x1C0003,name=PContent::Msg_PBlobConstructor) Value error: message was deserialized, but contained an illegal value
09-05 23:30:29.590 I/Gecko   (10699): 
09-05 23:30:29.590 I/Gecko   (10699): 
09-05 23:30:29.590 I/Gecko   (10699): ###!!! [Parent][DispatchAsyncMessage] Error: (msgtype=0x60003,name=???) Route error: message sent to unknown actor ID
09-05 23:30:29.590 I/Gecko   (10699): 
09-05 23:30:29.620 I/Gecko   (10699): [Parent 10699] WARNING: pipe error (192): Connection reset by peer: file ../../../gecko/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 451
09-05 23:30:29.620 I/Gecko   (10699): [Parent 10699] WARNING: pipe error: Broken pipe: file ../../../gecko/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 728
09-05 23:30:29.620 I/Gecko   (10699): [Parent 10699] WARNING: pipe error: Broken pipe: file ../../../gecko/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 728
09-05 23:30:29.620 I/Gecko   (10699): [Parent 10699] WARNING: pipe error: Broken pipe: file ../../../gecko/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 728
09-05 23:30:29.630 I/Gecko   (10699): 
09-05 23:30:29.630 I/Gecko   (10699): ###!!! [Parent][MessageChannel] Error: Channel error: cannot send/recv
09-05 23:30:29.630 I/Gecko   (10699): 
09-05 23:30:29.630 I/Gecko   (10699): 
09-05 23:30:29.630 I/Gecko   (10699): ###!!! [Parent][MessageChannel] Error: Channel error: cannot send/recv
09-05 23:30:29.630 I/Gecko   (10699): 
09-05 23:30:29.630 I/Gecko   (10699): 
09-05 23:30:29.630 I/Gecko   (10699): ###!!! [Parent][MessageChannel] Error: Channel error: cannot send/recv
09-05 23:30:29.630 I/Gecko   (10699): 
09-05 23:30:29.630 I/Gecko   (10699): 
09-05 23:30:29.630 I/Gecko   (10699): ###!!! [Parent][MessageChannel] Error: Channel error: cannot send/recv
09-05 23:30:29.630 I/Gecko   (10699): 
09-05 23:30:29.630 I/Gecko   (10699): 
09-05 23:30:29.630 I/Gecko   (10699): ###!!! [Parent][MessageChannel] Error: Channel error: cannot send/recv
09-05 23:30:29.630 I/Gecko   (10699): 
09-05 23:30:29.650 I/Gecko   (10699): 
09-05 23:30:29.650 I/Gecko   (10699): ###!!! [Parent][MessageChannel] Error: Channel error: cannot send/recv
09-05 23:30:29.650 I/Gecko   (10699):

Andrew Sutherland [:asuth] (he/him)

Reporter

Comment 8

•

10 years ago

:bent, can you or someone else review the MOZ_IPC_MESSAGE_LOG=1 logs?  I do understand you are all super-busy and last week was a focused work week.

[Blocking Requested - why for this release]:
Memory-backed Blobs causing process death is a potentially major problem for anything using web activities.  This affects v2.0-v2.2 on at least Flame and Nexus 4; per :djf it sounds like devices we actually ship don't need to perform EXIF rotation and so the gallery/email repro case may not be a huge problem for our shipped apps on known devices.  However, I worry about the ecosystem ramifications, so I think since we're still well inside 2.1 stabilization and trunk drift should hopefully be contained, it seems good try and get this addressed for v2.1 and evaluate v2.0 uplift.  (Hopefully the recent IndexedDB/PBackground stuff is orthogonal.)

blocking-b2g: --- → 2.1?

Flags: needinfo?(bent.mozilla)

Julien Wajsberg [:julienw] (PTO -> Jul 24)

Updated

•

10 years ago

Comment 9

•

10 years ago

We landed a workaround for SMS in bug 1059233 (not yet in v2.1 or v2.0, so it's easy to check on these branches), I think it's related to this issue. I'd like to remove the workaround if possible.

Andrew, do you see the issue in v1.4? I also asked for a regression window in bug 1059233, hopefully we'll know what caused this.

Dietrich Ayala (:dietrich)

Comment 10

•

10 years ago

"This affects v2.0-v2.2 on at least Flame and Nexus 4; per :djf it sounds like devices we actually ship don't need to perform EXIF rotation and so the gallery/email repro case may not be a huge problem for our shipped apps on known devices. "

Andrew, do we truly need to hold 2.1 on this? If it's not this, it's another activity-chain with another memory inflation scenario, right? I'm also concerned about the risk of changes like this in stabilization period.

blocking-b2g: 2.1? → backlog

Flags: needinfo?(bugmail)

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 11

•

10 years ago

Sorry, I'm not able to look at this any time soon...

Andrew Sutherland [:asuth] (he/him)

Reporter

Comment 13

•

10 years ago

[Blocking Requested - why for this release]:  Re-requesting because there was some confusion in comment 10 about the manifestation of this problem and the risk profile.  See below for more.

(In reply to Dietrich Ayala (:dietrich) from comment #10)
> Andrew, do we truly need to hold 2.1 on this? If it's not this, it's another
> activity-chain with another memory inflation scenario, right? I'm also
> concerned about the risk of changes like this in stabilization period.

This is not a memory-inflation scenario, it's death-by-web-activity-by-Blob.  The problem case here is:
- App "gonna-die" triggers an activity that opens app "returns-a-memory-backed-Blob" which was not already active.
- "returns-a-memory-backed-Blob" does new Blob([stuff], stuff) and returns it via its activity.
- RACE RACE RACE RACE RACE
  - "returns-a-memory-backed-Blob"'s process is terminated because it's no longer needed
  - "gonna-die" gets a reference to the Blob.  Attempts to use the Blob may either fail or result in "gonna-die" dying.  If "gonna-die" is very fast it may be able to use the Blob before "returns-a-memory-backed-Blob".

This is fundamentally a platform issue.  One that we can mitigate in our own apps (as we did for Tarako and 1.3 and may end up doing for 2.0/2.1/2.2), but if we want to be all "hey, other developers, it's a platform you can use", it's arguably friendly for us to not leave random app-breaking and app-killing edge-cases all over the place.

I'm arguing for trying to fix this earlier in 2.1 stabilization rather than later because the longer we wait to address this the greater the chance that an uplift to v2.1 becomes infeasible.

There are mitigations we can put in place:
- (limited) just putting some Gallery-mitigations back in place that already existed (see https://bugzilla.mozilla.org/show_bug.cgi?id=1059233#c16) to protect apps from death-by-Gallery-app, but the platform issues still remain.
- (limited) having the system app keep triggered processes around "long enough" to ensure the Blob is fine.  Unfortunately "long enough" is not something you can really determine.

blocking-b2g: backlog → 2.1?

Flags: needinfo?(bugmail)

Andrew Sutherland [:asuth] (he/him)

Reporter

Comment 14

•

10 years ago

(In reply to Julien Wajsberg [:julienw] from comment #9)
> Andrew, do you see the issue in v1.4? I also asked for a regression window
> in bug 1059233, hopefully we'll know what caused this.

Hadn't tried to reproduce, just updated my hamachi on 1.4 to usefully reproduce, but its wi-fi is broken.  So perhaps it makes sense to trust the bug 1059233 regression check, but keeping in mind that because of the involvement of races and all the other factors involved and the lack of a canonical STR on bug 1059233 with the regression check (it doesn't address things like making sure the gallery app was closed when triggering the activity), I'm not going to put too much stock in it.

Julien Wajsberg [:julienw] (PTO -> Jul 24)

Comment 15

•

10 years ago

I understand what you mean Andrew. I'll do more test today. I think we need a fix at least in v2.0 about this, and maybe the fix should be the workaround in the Gallery app. 

We also need to fix this issue for good, and find why bug 982779 regressed or maybe did not fix the issue correctly. I have a NI in bug 1059233 to look further today.

Julien Wajsberg [:julienw] (PTO -> Jul 24)

Comment 16

•

10 years ago

For the issue we're seeing in bug 1059233, I'm confident it works in v1.4 (tried latest peak build from geeksphone) and fails in v2.0 (latest Flame build from OTA).

Now I'll try to reproduce the regression window found by QA.

Julien Wajsberg [:julienw] (PTO -> Jul 24)

Comment 17

•

10 years ago

This is also affecting SMS in bug 1059233. As alternatives:

* we could uplift the workaround from bug 1059233 already landed in master. I'd rather backout it and this wouldn't fix the email bug anyway.
* apply the ugly workaround in Gallery from Bug 975599 (was removed in bug 1006919) again, but I think :djf doesn't want this.

The regression window in bug 1059233 wasn't helpful and only pointed to bug 1006919, which is strange because it was marked as "verified" back at the time. Anyway it doesn't point to any gecko regression. When I try to check the regression window myself, I get a crash as soon as I get the "crop" window, which makes it impossible to verify the bug (I don't know how QA managed to get a regression window..).

I'll now try the STR from bug 1062262.

blocking-b2g: 2.1? → 2.0?

Julien Wajsberg [:julienw] (PTO -> Jul 24)

Comment 18

•

10 years ago

I reproduce the issue in bug 1062262 in v1.4; I also reproduce the issue in SMS if I use a small enough image. This is quite funny (sort of) because I definitely don't reproduce bug 1059233 on v1.4.

Julien Wajsberg [:julienw] (PTO -> Jul 24)

Comment 19

•

10 years ago

STR for SMS:
1. have attachment 8492258 [details]
2. open SMS and start a new message
3. press "pick" button and choose gallery
4. choose image from step 1, don't crop and accept
5. tap the image in SMS' composer, then tap view

=> the gallery crashes instantly

Wayne Chang [:wchang]

Comment 20

•

10 years ago

Bhavana,

Please see blocking requests above from Andrew and Julien

Flags: needinfo?(bbajaj)

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 21

•

10 years ago

It is several months too late to block 2.0 on this.

Andrew Sutherland [:asuth] (he/him)

Reporter

Comment 22

•

10 years ago

(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #21)
> It is several months too late to block 2.0 on this.

Can you clarify?  Is this a pedantic statement or is a fix intractable for v2.0 for some reason?  For the email app at least we're still getting/have v2.0 blockers.  I think the idea would be that it would be good to prioritize an investigation of this problem with targeted uplift of any fix (if viable) to v2.0.  We will likely have to workaround in gaia on v2.0 and arguably partners are going to be throwing around "acceptance blocker" and all that stuff if it's not fixed one way or the other.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 23

•

10 years ago

I have no idea what the cause of the bug is, so I can't say whether it's possible to put together a fix for 2.0 or not.  Based on our past experience here I'd expect any fix to be relatively complicated.

But more importantly, it's unlikely that we will have resources (meaning me or bent) to investigate and fix this on any short time scale.

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 24

•

10 years ago

(In reply to Julien Wajsberg [:julienw] from comment #19)
> => the gallery crashes instantly

Julien, if this is reliably reproducible for you, is there any chance you can attach gdb to the gallery app before you do this and get a stack trace of the crash?

Flags: needinfo?(bent.mozilla)

Julien Wajsberg [:julienw] (PTO -> Jul 24)

Comment 25

•

10 years ago

Yep, I'll try this.

Flags: needinfo?(felash)

Dietrich Ayala (:dietrich)

Comment 26

•

10 years ago

Triage group reviewed - blocking+ for 2.0. There are enough comments about the user impact in this bug to have great concern. If porting a fix to 2.0 becomes impossible, renom.

Julien Wajsberg [:julienw] (PTO -> Jul 24)

Comment 27

•

10 years ago

Ben, I have some difficulties having a debug build. I built for Hamachi but the /system partition is too small for a debug build; and I can't have a working backup-flame for Flame. So I'm quite stuck for today.

If you want to try on your side, you need to backout bug 1059233 to be able to reproduce the STR in comment 19.

Keeping the NI.

Julien Wajsberg [:julienw] (PTO -> Jul 24)

Comment 28

•

10 years ago

David, how do you feel having your ugly workaround on v2.0 again?

Flags: needinfo?(dflanagan)

Dietrich Ayala (:dietrich)

Updated

•

10 years ago

blocking-b2g: 2.0? → 2.0+

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 29

•

10 years ago

(In reply to Julien Wajsberg [:julienw] from comment #27)
> Ben, I have some difficulties having a debug build.

Thanks Julien. You shouldn't need a DEBUG build just to get a stack trace of the crash though. As long as you build it yourself I think gdb should be able to produce something usable.

Of course, a DEBUG build would be a lot better, since it would have assertions/warnings too...

Julien Wajsberg [:julienw] (PTO -> Jul 24)

Comment 30

•

10 years ago

Got a debug build, but couldn't get a gdb stacktrace... There is no more preallocated process so I can't attach to it, and I couldn't get a stacktrace when email failed using the STR from bug 1062262.

Will try to get some help tomorrow but I'm sure it would be easier and faster if someone else try to do it.

Julien Wajsberg [:julienw] (PTO -> Jul 24)

Comment 31

•

10 years ago

(In reply to ben turner [:bent] (use the needinfo? flag!) from comment #29)
> (In reply to Julien Wajsberg [:julienw] from comment #27)
> > Ben, I have some difficulties having a debug build.
> 
> Thanks Julien. You shouldn't need a DEBUG build just to get a stack trace of
> the crash though. As long as you build it yourself I think gdb should be
> able to produce something usable.

So actually not, I couldn't get any stack trace, either from a DEBUG or non-DEBUG build. It's probably me though :/

Flags: needinfo?(felash)

Julien Wajsberg [:julienw] (PTO -> Jul 24)

Updated

•

10 years ago

Flags: needinfo?(felash)

Gregor Wagner [:gwagner]

Comment 32

•

10 years ago

(In reply to Julien Wajsberg [:julienw] from comment #31)
> (In reply to ben turner [:bent] (use the needinfo? flag!) from comment #29)
> > (In reply to Julien Wajsberg [:julienw] from comment #27)
> > > Ben, I have some difficulties having a debug build.
> > 
> > Thanks Julien. You shouldn't need a DEBUG build just to get a stack trace of
> > the crash though. As long as you build it yourself I think gdb should be
> > able to produce something usable.
> 
> So actually not, I couldn't get any stack trace, either from a DEBUG or
> non-DEBUG build. It's probably me though :/

Alex, can you help Julien here?

Flags: needinfo?(lissyx+mozillians)

David Flanagan [:djf]

Assignee

Comment 33

•

10 years ago

(In reply to Julien Wajsberg [:julienw] from comment #28)
> David, how do you feel having your ugly workaround on v2.0 again?

I don't like the idea, but if that is what we need to do, at least we know we have some tested code we can use.

Note, however, that the workaround from 975599 won't work as-is. That workaround was only for the case where the user was picking an image and cropping it (turning the original file-backed blob into a memory-backed blob.)

The STR in comment 19 explicitly say not to crop the image.  In this case, however, the test image that Julien has attached uses EXIF orientation.  Since EXIF orientation does not work on the web, the gallery rotates the image before returning it from the pick activity. So in this case (with this specific test image) the image is being converted from a file-backed blob to a memory-backed blob without any cropping.

The share activity does something similar: it will return a memory-backed blob if the image being shared relies on EXIF orientation.

So we'll need to modify the patch from bug 975599 so that it saves any memory-backed image, either picked or shared, to the card.

Let me know if this becomes necessary. It is probably most efficient for me to do that work.

Flags: needinfo?(dflanagan)

:gerard-majax

Comment 34

•

10 years ago

So far, we were unable to reproduce on my Flame.

Flags: needinfo?(lissyx+mozillians)

:gerard-majax

Comment 35

•

10 years ago

Attached file stacktrace — Details

:gerard-majax

Comment 36

•

10 years ago

09-26 15:53:36.157   297   297 I Gecko   : ###!!! [Parent][DispatchAsyncMessage] Error: (msgtype=0x1C0003,name=PContent::Msg_PBlobConstructor) Value error: message was deserialized, but contained an illegal value

:gerard-majax

Comment 37

•

10 years ago

So the signal received is a SIGTERM, that comes from the creation of the mMonitor() at http://dxr.mozilla.org/mozilla-central/source/dom/ipc/Blob.cpp#167

Julien Wajsberg [:julienw] (PTO -> Jul 24)

Comment 38

•

10 years ago

So we managed to reproduce on Alexandre's Flame, but only once we reduced the available memory. (on my own Flame I had 1G and the bug reproduced... let's just say it can depend on the available memory).

Stack trace has been provided so removing NI for now. But please ask if you need anything from me.

Flags: needinfo?(felash)

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 39

•

10 years ago

This looks like we're just out of memory...

Julien Wajsberg [:julienw] (PTO -> Jul 24)

Comment 40

•

10 years ago

I doubt it, the workaround patch in bug 1059233 fixes the issue, but it's cloning the blob in memory... That would surely fail if we're out of memory.

:gerard-majax

Comment 41

•

10 years ago

Ben do you see any reason for getting SIGTERM in a memset() call?

Flags: needinfo?(bent.mozilla)

:gerard-majax

Comment 42

•

10 years ago

So, SIGTERM comes from http://hg.mozilla.org/mozilla-central/annotate/5e704397529b/ipc/chromium/src/base/process_util_posix.cc#l70

Let's continue to find why.

Assignee: nobody → lissyx+mozillians

Flags: needinfo?(bent.mozilla)

:gerard-majax

Comment 43

•

10 years ago

Attached file ipc error stacktrace — Details

stacktrace when KillProcess is called

:gerard-majax

Comment 44

•

10 years ago

It means that:
> 09-29 15:46:01.616  3928  3928 I Gecko   : ###!!! [Parent][DispatchAsyncMessage] Error: (msgtype=0x1C0003,name=PContent::Msg_PBlobConstructor) Value error: message was deserialized, but contained an illegal value

is a valid error message, I think.

> (gdb) f 5
> #5  0xb5192ce6 in mozilla::ipc::MessageChannel::OnMaybeDequeueOne (this=<optimized out>) at ../../../gecko/ipc/glue/MessageChannel.cpp:1098
> 1098	        DispatchMessage(recvd);
> (gdb) l
> 1093	        AutoEnterRPCTransaction transaction(this, &recvd);
> 1094	
> 1095	        MonitorAutoUnlock unlock(*mMonitor);
> 1096	
> 1097	        CxxStackFrame frame(*this, IN_MESSAGE, &recvd);
> 1098	        DispatchMessage(recvd);
> 1099	    }
> 1100	    return true;
> 1101	}
> 1102	
> (gdb) print recvd 
> $1 = {<Pickle> = {static kPayloadUnit = 1179403647, header_ = 0xac58db00, header_size_ = 32, capacity_ = 128, variable_buffer_offset_ = 0}, _vptr.Message = 0xb664e490, file_descriptor_set_ = {
>     mRawPtr = 0x0}, name_ = 0xb626927a "PContent::Msg_PBlobConstructor"}

:gerard-majax

Comment 45

•

10 years ago

Attached file MsgValueError stacktrace — Details

:gerard-majax

Comment 46

•

10 years ago

(gdb) print params
$2 = {mValue = {VChildBlobConstructorParams = "\000\021ҩ", '\000' <repeats 13 times>, "\004\000\000\000\000\000\000FR+\266\000\000\000\000\003\000\000\000@\201\064\251\003\000\000", 
    VParentBlobConstructorParams = "\000\021ҩ", '\000' <repeats 13 times>, "\004\000\000\000\000\000\000FR+\266\000\000\000\000\003\000\000\000@\201\064\251\003\000\000\000\254\321?\264\001\000\000"}, mType = mozilla::dom::BlobConstructorParams::TParentBlobConstructorParams}

Josh Matthews [:jdm]

Comment 47

•

10 years ago

Comment on attachment 8496843 [details]
MsgValueError stacktrace

It appears that AllocPBlobParent returned null, which means you probably want to look at BlobParent::Create.

Josh Matthews [:jdm]

Comment 48

•

10 years ago

That also means that the child is sending a MysteryBlob constructor.

Josh Matthews [:jdm]

Comment 49

•

10 years ago

No, sorry, the Slice call could be failing instead.

:gerard-majax

Comment 50

•

10 years ago

So, the AllocPBlobParent() call, which is the one failing in attachment 8496843 [details] indeed calls BlobParent::Create(), http://hg.mozilla.org/mozilla-central/annotate/5e704397529b/dom/ipc/Blob.cpp#l1117

I see no code handing the type visible comment 46: BlobConstructorParams::TParentBlobConstructorParams

:gerard-majax

Comment 51

•

10 years ago

Attached file stepping into AllocPBlobParent — Details

Looks like it's slice-related.

:gerard-majax

Comment 52

•

10 years ago

Attached file GetSize()/CreateSlice() — Details

:gerard-majax

Comment 53

•

10 years ago

According to attachment 8496852 [details], it would be CreateSlice() that fails because of mActor being NULL, and making the AllocPBlobParent() failing in result.

:gerard-majax

Comment 54

•

10 years ago

FYI, I have very easy reproductibility with:
 - the image provided by julien
 - flame set at 319MB
 - revering bug 1059233

:gerard-majax

Comment 55

•

10 years ago

> 09-29 17:17:32.056 13662 13662 I Gecko   : BlobParent::RemoteBlob::CreateSlice this=0xac148280 mActor=0xa92e8800
> 09-29 17:17:36.406 13662 13662 I Gecko   : BlobParent::RemoteBlob::CreateSlice this=0xac148280 mActor=0x0
> 09-29 17:17:36.406 13662 13662 I Gecko   : 
> 09-29 17:17:36.406 13662 13662 I Gecko   : ###!!! [Parent][DispatchAsyncMessage] Error: (msgtype=0x1C0003,name=PContent::Msg_PBlobConstructor) Value error: message was deserialized, but contained an illegal value

:gerard-majax

Comment 56

•

10 years ago

Attached file ActorDestroy stracktrace when mActor is nulled — Details

:gerard-majax

Comment 57

•

10 years ago

Ben, maybe you can have another look with all those new infos ?

Flags: needinfo?(bent.mozilla)

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 58

•

10 years ago

Ok, it looks like the child sends the parent a blob and we copy it over. Then the child dies for some reason (attachment 8496896 [details] says it's a channel error, but I bet that's just due to bug 1035454). Then the parent tries to slice the blob and fails because we've already cleaned up the child actors.

There's no short-term fix for this. We've been talking about redoing the way we slice blobs, but that's going to be a scary and invasive patch.

Flags: needinfo?(bent.mozilla)

:gerard-majax

Comment 59

•

10 years ago

Thanks! What should we be doing then, if there is no safe fix we can provide given it's 2.0+ ?

Flags: needinfo?(bent.mozilla)

Flags: needinfo?(anygregor)

Gregor Wagner [:gwagner]

Comment 60

•

10 years ago

[Blocking Requested - why for this release]:

(In reply to Dietrich Ayala (:dietrich) from comment #26)
> Triage group reviewed - blocking+ for 2.0. There are enough comments about
> the user impact in this bug to have great concern. If porting a fix to 2.0
> becomes impossible, renom.

blocking-b2g: 2.0+ → 2.0?

Flags: needinfo?(anygregor)

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 61

•

10 years ago

Well, I'd retest now that bug 994190 landed, but if that doesn't help then we could change the activities code to keep the child process alive a little bit longer...

Flags: needinfo?(bent.mozilla)

Julien Wajsberg [:julienw] (PTO -> Jul 24)

Comment 62

•

10 years ago

IMO the best bet is to land the workaround in Gallery. Kind of sad but the safest, and possible for v2.0.

The consequences in SMS are not that bad, I could live with it, but the consequences in E-Mail are really bad.

Andrew Sutherland [:asuth] (he/him)

Reporter

Comment 63

•

10 years ago

(In reply to ben turner [:bent] (use the needinfo? flag!) from comment #61)
> Well, I'd retest now that bug 994190 landed, but if that doesn't help then
> we could change the activities code to keep the child process alive a little
> bit longer...

Do we expect this to 100% address the problem or just make us more likely to win a race?  Specifically, does:
A) the recipient process (email) need the originating process (gallery) to still be alive when it goes to use the Blob
or just
B) the parent process needs the originating process (gallery) to still be alive to finish the process of completely replicating the Blob in the parent process, but then email can wait as long as it wants to do things with the Blob.

Because if it's "B" that sounds great.

Flags: needinfo?(bent.mozilla)

Wayne Chang [:wchang]

Comment 64

•

10 years ago

[Blocking Requested - why for this release]:

Going to 2.1? as it seems that this would not impact the known to be releasing 2.0 devices.

Please let me know if my understanding is incorrect, otherwise lets get this fixed on 2.1 and do an 2.0 uplift approval if available.

blocking-b2g: 2.0? → 2.1?

:gerard-majax

Comment 65

•

10 years ago

Looks like I reproduced this on my Flame, after restoring my Nexus S profile and opening a smart collection (Social).

Julien Wajsberg [:julienw] (PTO -> Jul 24)

Comment 66

•

10 years ago

[Blocking Requested - why for this release]:

(In reply to Wayne Chang [:wchang] from comment #64)
> [Blocking Requested - why for this release]:
> 
> Going to 2.1? as it seems that this would not impact the known to be
> releasing 2.0 devices.

Where do you get this, Wayne?

blocking-b2g: 2.1? → 2.0?

:gerard-majax

Comment 67

•

10 years ago

Leaving the bug so that someone else can address this.

Assignee: lissyx+mozillians → nobody

Andrew Overholt [:overholt]

Comment 68

•

10 years ago

If we land the workaround in the gallery, does it fix the problem for email?

There's an unknown amount of risk here (with an unknown fix) and at this point in 2.0's timeline it seems tight.

Flags: needinfo?(bugmail)

Andrew Sutherland [:asuth] (he/him)

Reporter

Comment 69

•

10 years ago

(In reply to Andrew Overholt [:overholt] from comment #68)
> If we land the workaround in the gallery, does it fix the problem for email?

Not the existing workaround, no.  See comment 33.

Flags: needinfo?(bugmail)

bhavana bajaj [:bajaj]

Comment 70

•

10 years ago

:djf,

Given the above comment can you comment on how risky would it be to re-write and uplift a patch per https://bugzilla.mozilla.org/show_bug.cgi?id=1063658#c33 to fix this issue in 2.0 ?

Flags: needinfo?(bbajaj)

Keywords: qawanted

Andrew Sutherland [:asuth] (he/him)

Reporter

Comment 71

•

10 years ago

(NI implied per comment 70)

Flags: needinfo?(dflanagan)

Joshua Mitchell (Inactive)

Comment 72

•

10 years ago

I think this QA-Wanted tag was added by accident instead of a NI

Keywords: qawanted

Wayne Chang [:wchang]

Comment 73

•

10 years ago

(In reply to Julien Wajsberg [:julienw] (PTO until 7th October, ask :steveck or :azasypkin for SMS stuff) from comment #66)
> [Blocking Requested - why for this release]:
> 
> (In reply to Wayne Chang [:wchang] from comment #64)
> > [Blocking Requested - why for this release]:
> > 
> > Going to 2.1? as it seems that this would not impact the known to be
> > releasing 2.0 devices.
> 
> Where do you get this, Wayne?

I thought comment 6 suggested that devices need to be doing EXIF rotation to hit this.

Andrew,
Can you provide an easy to follow STR so we can ask QA to test this on the known launching 2.0 devices to assess this? Thanks!

Mike,
Once we have the STR can you please test a few devices (check with Rachelle/Howie on which to test)?

Thanks.

Flags: needinfo?(bugmail)

Wayne Chang [:wchang]

Updated

•

10 years ago

Flags: needinfo?(mlien)

Andrew Sutherland [:asuth] (he/him)

Reporter

Comment 74

•

10 years ago

(In reply to Wayne Chang [:wchang] from comment #73)
> Can you provide an easy to follow STR so we can ask QA to test this on the
> known launching 2.0 devices to assess this? Thanks!

Download the first attachment (90 degrees) from bug 1062262 to any Firefox OS device.  Try to attach it to an email via the "gallery" app.  Observe failure to attach with high probability.

As noted above in the bug, this may also reproduce on freshly taken pictures on devices that take pictures with EXIF rotation rather than rotating the image before saving to disk.  I have no innate knowledge of shipping 2.0 devices or their camera capabilties and would have to do extensive research of private documents to figure this out.  I'm currently dealing with several blockers and I believe we have multiple layers of management that keep track of this, so I'll defer to them to perform the analysis.

Flags: needinfo?(bugmail)

Julien Wajsberg [:julienw] (PTO -> Jul 24)

Comment 75

•

10 years ago

NI me, I want to check whether the email bug happens when cropping too.

Flags: needinfo?(felash)

Julien Wajsberg [:julienw] (PTO -> Jul 24)

Updated

•

10 years ago

Comment 76

•

10 years ago

(In reply to bhavana bajaj [:bajaj] from comment #70)
> :djf,
> 
> Given the above comment can you comment on how risky would it be to re-write
> and uplift a patch per
> https://bugzilla.mozilla.org/show_bug.cgi?id=1063658#c33 to fix this issue
> in 2.0 ?

Not very risky. We had it working in 1.3 and 1.4. I think I can generalize it to handle the EXIF rotation as well as the cropping case and also the share activity as well as the pick activity.

Given Ben's pessimism about a platform fix in comment #58, and given that bug 1077106 is probably the same underlying issue, I'm taking this bug and will put the hack back in gallery and will request uplift approval for 2.1 and 2.0.

Assignee: nobody → dflanagan

Flags: needinfo?(dflanagan)

David Flanagan [:djf]

Assignee

Comment 77

•

10 years ago

(In reply to ben turner [:bent] (use the needinfo? flag!) from comment #58)
> 
> There's no short-term fix for this. We've been talking about redoing the way
> we slice blobs, but that's going to be a scary and invasive patch.

Ben: do you have a bug filed for the long-term fix? I'd like to have a bug number I can cite in my workaround.

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 78

•

10 years ago

Yep, bug 1079546.

Flags: needinfo?(bent.mozilla)

David Flanagan [:djf]

Assignee

Comment 79

•

10 years ago

I can reproduce this issue in the following cases:

1) Start email app. Pick a photo from Gallery that uses EXIF rotation. (test image 55 e.g.)
2) Start MMS app. Pick a photo from Gallery that uses EXIF rotation. (test image 55 e.g.)
3) Start MMS app. Pick a photo photo that does not have a small file size (a photo or either of the test images that display test patterns, e.g.) and crop it.
4) Start Gallery share a photo with EXIF rotation (test photo 54, e.g.) with the email app
5) Share test photo 54 with the MMS app
6) Share test photo 54 with the Wallpaper app

David Flanagan [:djf]

Assignee

Comment 80

•

10 years ago

Attached file link to patch on github — Details

This patch fixes the bug in each of the scenarios described above.

I'm asking for Punam's review because she worked on this workaround the first time around. Punam: I've restructured the code a bit, but this should all look very familiar to you.

Andrew: I'm asking for your review because you've got fresh eyes and your app is most impacted by this bug.

When we landed this workaround the first time for v1.3 and 1.4, we never had any tests written for it. I'm comfortable landing it again without tests. (But if there is disagreement about that from the reviewers, I'll see what I can do, though it will depend on whether we've got a good MockDeviceStorage already written somewhere.)

Attachment #8501423 - Flags: review?(pdahiya)

Attachment #8501423 - Flags: review?(bugmail)

David Flanagan [:djf]

Assignee

Comment 81

•

10 years ago

I should mention that the test images I refer to in comment #79 are in apps/gallery/test/images/

bhavana bajaj [:bajaj]

Updated

•

10 years ago

blocking-b2g: 2.0? → 2.0+

Julien Wajsberg [:julienw] (PTO -> Jul 24)

Comment 82

•

10 years ago

(In reply to David Flanagan [:djf] from comment #79)
> I can reproduce this issue in the following cases:
> 
> 1) Start email app. Pick a photo from Gallery that uses EXIF rotation. (test
> image 55 e.g.)
> 2) Start MMS app. Pick a photo from Gallery that uses EXIF rotation. (test
> image 55 e.g.)
> 3) Start MMS app. Pick a photo photo that does not have a small file size (a
> photo or either of the test images that display test patterns, e.g.) and
> crop it.

Actually, the trick is really to crop it small enough so that the resulting image is < 300K, the bug should happen with any image in that case.


> 4) Start Gallery share a photo with EXIF rotation (test photo 54, e.g.) with
> the email app
> 5) Share test photo 54 with the MMS app
> 6) Share test photo 54 with the Wallpaper app

Andrew Sutherland [:asuth] (he/him)

Reporter

Comment 83

•

10 years ago

Comment on attachment 8501423 [details] [review]
link to patch on github

r=asuth.  I also have some minor enhancements that I think would be nice to have, I've put them in 2 separate commits available at https://github.com/asutherland/gaia/compare/mozilla-b2g:master...asutherland:gallery-nuts-to-temp-files?expand=1 that you should feel free to squash into your patch as you like.

They are:
1) Have the pick activity provide the original filename as a separate "name" in the activity so that the filename doesn't end up being the new random filename you generated.  Email already had some old fallback logic in place for this, but in the wrong order for this use-case.  I'm assuming we used to use "name" somewhere or still do somewhere elsewhere.

2) Have the file killer also guard against files that are from the future.  In general I think our devices prefer living in the past rather than the future so this shouldn't be necessary, but it seems dangerous to otherwise potentially let a huge number of files accumulate if someone does run into this problem.  The flip side of this is that if the device has some kind of absurd time-date oscillation that the temp files might end up only living as long as a single pick/share activity.  However, that's fine for the email app.


Over-engineering-wise, I think it'd be preferable to have some kind of numeric/LRU-style bound on killing the temporary files but am fine with what we have right now on the basis that:
- We shipped something like this before
- Simple is good
- It helps protect apps that make risky assumptions about the lifetime of the user's images.  A day seems to strike a nice balance.
- The most semantically correct option of using/abusing reference-counted IndexedDB blobs seems likely to have its own scary risk problems, especially given the recent regressions we've seen in this area.


I tested on a v2.2 flame device with share and pick and manually used "touch -t" to verify that the reaping logic obliterates the future and the past correctly.

Attachment #8501423 - Flags: review?(bugmail) → review+

David Flanagan [:djf]

Assignee

Comment 84

•

10 years ago

(In reply to Julien Wajsberg [:julienw] (PTO until 7th October, ask :steveck or :azasypkin for SMS stuff) from comment #82)
> (In reply to David Flanagan [:djf] from comment #79)
> > I can reproduce this issue in the following cases:
> > 
> > 1) Start email app. Pick a photo from Gallery that uses EXIF rotation. (test
> > image 55 e.g.)
> > 2) Start MMS app. Pick a photo from Gallery that uses EXIF rotation. (test
> > image 55 e.g.)
> > 3) Start MMS app. Pick a photo photo that does not have a small file size (a
> > photo or either of the test images that display test patterns, e.g.) and
> > crop it.
> 
> Actually, the trick is really to crop it small enough so that the resulting
> image is < 300K, the bug should happen with any image in that case.
> 

I've got a bunch or text images now that are just text on a solid background so they end up with a small filesize. And when cropped, the cropped region is also small, and I can't reproduce the bug on those. Ben thinks that the underlying bug is all about slicing (which is what the Email app does).  So I'm thinking that maybe what's going on here (and the reason that bug 1077106 is reported only on 2.2) is that I convinced you to switch to using shared/js/image_utils.js which slices the first 32kb of jpeg files. So with jpegs smaller than 32kb the bug is not triggered.  In any case, according to Ben this isn't a lifetime bug anymore but a slice() bug.  I'm not sure how deeply he's looked into it.

In any case, I think I'm going to try modifying apps/sms/test/marionette/apps/activitycaller to see if I can make it reproduce this bug and land that change along with this patch so we'll know if we break it again.

David Flanagan [:djf]

Assignee

Comment 85

•

10 years ago

Andrew,

Thanks for the review and the add-on code. I'll take a look and see if I can do anything about the name issue.  Somewhat related: with this fix in place, your app can remove the nocrop:true property from the activity request and allow users to crop the images that they're attaching, if you feel so inclined.

Also: does email have any integration tests that could be modified to exercise this bug and its patch?

Flags: needinfo?(bugmail)

David Flanagan [:djf]

Assignee

Comment 86

•

10 years ago

I'm striking out on my idea of modifying the sms app marionette tests since they all fail in the marionette harness when I run them locally. If I can't even run them, I certainly can't modify them. Also, I've now realized that the sms tests uses their own custom app to do the activities, and modifying the tests to actually use gallery would probably be harder and would introduce an unwanted dependency between the two apps.  So maybe it is the python marionette tests where this should go? In either case, possibly too much work right now for a patch that we just need to get landed.

David Flanagan [:djf]

Assignee

Comment 88

•

10 years ago

The patch for this bug fixes bug 1077106, so I've closed that bug as a dupe of this one.
That bug is a SMS regression and a smoketest blocker, so I'm carrying those flags over here.

Keywords: regression, smoketest

David Flanagan [:djf]

Assignee

Comment 89

•

10 years ago

Given that this is now a smoketest issue, I think I should get the workaround landed today.  I'll file a followup bug to try to get an automated test that can ensure we don't regress this for a third time.

Julien Wajsberg [:julienw] (PTO -> Jul 24)

Comment 90

•

10 years ago

(In reply to David Flanagan [:djf] from comment #89)
> Given that this is now a smoketest issue, I think I should get the
> workaround landed today.  I'll file a followup bug to try to get an
> automated test that can ensure we don't regress this for a third time.

Yes, a separate bug is good enough for me.

(clearing my NI; I still don't know why this worked in v1.4, but it's ok, I can live with this).

Thanks a lot for your work David.

Flags: needinfo?(felash)

Julien Wajsberg [:julienw] (PTO -> Jul 24)

Comment 92

•

10 years ago

I just reverted the patch in bug 1059233 and duped the bug to this bug. David, please check with an uptodate master version of SMS!

Punam Dahiya [:pdahiya]

Comment 93

•

10 years ago

Comment on attachment 8501423 [details] [review]
link to patch on github

Patch looks good and fixes the crash seen in email and SMS app due to memory backed blobs. 
I see a bug on latest master flame-kk while trying to view attached image in SMS app. Symptoms are similar to second part of bug 1059233, since bug 1059233 is  marked resolved, I have created a new bug 1080020 to track this issue.

Attachment #8501423 - Flags: review?(pdahiya) → review+

David Flanagan [:djf]

Assignee

Comment 94

•

10 years ago

I've been pondering whether to take Andrew's additional commits in the patch. Leaning against the original name patch since in the case that we need a name, the blob we are passing is actually different (cropped or rotated) than the original file. So it would arguably be misleading to assert that it is the same.  I'm not sure what to do about the time one. But then I heard that :bent is actively working on a solution in gecko, so now I'm leaning toward just landing what we've got and calling it good enough.

David Flanagan [:djf]

Assignee

Comment 95

•

10 years ago

Landed on master: https://github.com/mozilla-b2g/gaia/commit/b46e28a4d9d042452b2f1fe07a191317caccc6fa

Status: NEW → RESOLVED

Closed: 10 years ago

status-b2g-v2.2: affected → fixed

Resolution: --- → FIXED

David Flanagan [:djf]

Assignee

Comment 96

•

10 years ago

Changing product and component so that I can request uplift of the gaia workaround

Component: DOM → Gaia::Gallery

Product: Core → Firefox OS

Version: Trunk → unspecified

David Flanagan [:djf]

Assignee

Comment 97

•

10 years ago

Comment on attachment 8501423 [details] [review]
link to patch on github

[Approval Request Comment]
[Bug caused by] (feature/regressing bug #): I'm not sure why this bug came back. We had a workaround very much like this in 1.3 and 1.4. Then the gecko bug was fixed and we removed it. But now we have problems again in 2.0 and later. This issue is different than and predates the changes in bug 994190, however.


[User impact] if declined: Images with EXIF orientation can't be attached to email or shared as wallpaper. Cropped images or images with EXIF orientation can't be attached to SMS messages.

[Testing completed]: on master

[Risk to taking this patch] (and alternatives if risky): not very risky because this workaround has been in v1.3 for quite a while now.

[String changes made]: none

Attachment #8501423 - Flags: approval-gaia-v2.1?(fabrice)

Attachment #8501423 - Flags: approval-gaia-v2.0?(fabrice)

David Flanagan [:djf]

Assignee

Comment 98

•

10 years ago

Filed bug 1080104 for removal of this workaround when the underlying bug 1079546 gets fixed.

David Flanagan [:djf]

Assignee

Comment 99

•

10 years ago

Filed bug 1080122 for adding integration tests to make sure this doesn't regress again.

Oliver Nelson [:oliverthor]

Comment 100

•

10 years ago

VERIFIED FIX in v2.2 Master Flame KK by STR of bug child duplicate: bug 1077106
Results: User can attach cropped gallery image to messages app without app closing.

Worth noting that bug 1077667 was RESOLVED DUPLICATE to 1077106, however that issue REPROES following it's own STR and identical Environmental variables. 


Enviromental Variables:
----------------------------------------
Device: Flame 2.2 Master
BuildID: 20141009040206
Gaia: 7b92615bdc97e5c675cd385ec68bc5e47e0c5288
Gecko: f0bb13ef0ee4
Gonk: 52c909e821d107d414f851e267dedcd7aae2cebf
Version: 35.0a1 (2.2 Master)
Firmware: V180
User Agent: Mozilla/5.0 (Mobile; rv:35.0) Gecko/35.0 Firefox/35.0

status-b2g-v2.2: fixed → verified

Flags: needinfo?(pbylenga)

Keywords: verifyme

Oliver Nelson [:oliverthor]

Updated

•

10 years ago

Status: RESOLVED → VERIFIED

bhavana bajaj [:bajaj]

Comment 101

•

10 years ago

Comment on attachment 8501423 [details] [review]
link to patch on github

QA please mkae sure to verify once this lands on 2.0, thanks !

Attachment #8501423 - Flags: approval-gaia-v2.1?(fabrice)

Attachment #8501423 - Flags: approval-gaia-v2.1+

Attachment #8501423 - Flags: approval-gaia-v2.0?(fabrice)

Attachment #8501423 - Flags: approval-gaia-v2.0+

Ryan VanderMeulen [:RyanVM]

Comment 102

•

10 years ago

v2.1: https://github.com/mozilla-b2g/gaia/commit/d4aedd4050f74c3503ce2a540b5a3e646ae6c6c4
v2.0: https://github.com/mozilla-b2g/gaia/commit/ee3cfd7700839ed308661bfc6795b162b0cafa4e

status-b2g-v2.0: affected → fixed

status-b2g-v2.1: affected → fixed

Target Milestone: --- → 2.1 S6 (10oct)

Andrew Sutherland [:asuth] (he/him)

Reporter

Comment 103

•

10 years ago

(Updating the bug title to more accurately reflect what the fix on this bug actually does.)

Flags: needinfo?(bugmail)

Summary: Memory-backed Blobs aren't surviving long enough in multi-process environments again (again) → Work around memory-backed Blob lifetime issues by saving them to DeviceStorage-backed temporary files in .gallery/tmp and returning those instead.

Andrew Sutherland [:asuth] (he/him)

Reporter

Updated

•

10 years ago

Summary: Work around memory-backed Blob lifetime issues by saving them to DeviceStorage-backed temporary files in .gallery/tmp and returning those instead. → Work around memory-backed Blob lifetime issues when passed between processes by Web Activities by saving them to DeviceStorage-backed temporary files in .gallery/tmp and returning those instead.

Peter Bylenga [:PBylenga]

Updated

•

10 years ago

QA Whiteboard: [QAnalyst-Triage+]

Flags: needinfo?(pbylenga)

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 105

•

10 years ago

Anyone want to give the patches in bug 1079546 a spin? Builds are here: https://ftp-ssl.mozilla.org/pub/mozilla.org/firefox/try-builds/bturner@mozilla.com-f7017c8f2012

Kai-Zhen Li [:seinlin][:kli]

Comment 106

•

10 years ago

https://github.com/mozilla-b2g/gaia/commit/ee3cfd7700839ed308661bfc6795b162b0cafa4e

status-b2g-v2.0M: --- → fixed

Mike Lien[:mlien]

Updated

•

10 years ago

Flags: needinfo?(mlien)

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 107

•

10 years ago

Bug 1087464 also bit blobs particularly, so you might want to see if workarounds can be removed now that that is fixed.

Martin Shuman [:Marty]

Comment 108

•

10 years ago

Verified fixed in the latest Flame 2.1 and 2.0 builds using the STR from bug 1077106.
MMS attachment is cropped, viewable, and sent properly.

Environmental Variables:
Device: Flame 2.1 (319MB)
BuildID: 20141105001204
Gaia: 154da5e17029a51002d5d9b7df39563d509edde6
Gecko: 3b0c3580a58d
Version: 34.0 (2.1)
Firmware: V188
User Agent: Mozilla/5.0 (Mobile; rv:34.0) Gecko/34.0 Firefox/34.0

Environmental Variables:
Device: Flame 2.0 (319MB)
BuildID: 20141105000201
Gaia: fe2167fa5314c7e71c143a590914cbf3771905a8
Gecko: 093de6b632c5
Version: 32.0 (2.0)
Firmware: V188
User Agent: Mozilla/5.0 (Mobile; rv:32.0) Gecko/32.0 Firefox/32.0

QA Whiteboard: [QAnalyst-Triage+] → [QAnalyst-Triage?]

status-b2g-v2.0: fixed → verified

status-b2g-v2.1: fixed → verified

Flags: needinfo?(pbylenga)

Peter Bylenga [:PBylenga]

Updated

•

10 years ago

QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]

Flags: needinfo?(pbylenga)

William Hsu [:whsu]

Comment 109

•

10 years ago

Thanks Martin!
Remove "verifyme"

Keywords: verifyme

Coler

Comment 110

•

10 years ago

This issue has been verified successfully on Woodduck 2.0 with STR from bug 1077106.
Reproducing rate: 0/10
See attachment: Verify_Woodduck_MMS.mp4

build version:
Gaia-Rev        87b23fa81c3b59f2ba24b84f7d686f4160d4e7cb
Gecko-Rev       d049d4ef127844121c9cf14d2e8ca91fd9045fcb
Build-ID        20141125050313
Version         32.0

status-b2g-v2.0M: fixed → verified

Coler

Comment 111

•

10 years ago

Attached video Verify_Woodduck_MMS.mp4 — Details

flame-v2.1-email-attach-death.txt 10 years ago Andrew Sutherland [:asuth] (he/him) 18.63 KB, text/plain		Details
flame-kk v2.2 trunk build with MOZ_IPC_MESSAGE_LOG=1 grep-filtered 10 years ago Andrew Sutherland [:asuth] (he/him) 47.07 KB, text/plain		Details
flame-kk-trunk-blob-sadness-MOZ_IPC_MESSAGE_LOG=1.txt FULL 10 years ago Andrew Sutherland [:asuth] (he/him) 388.89 KB, text/plain		Details
flame-kk-trunk-blob-reproduced-filtered.txt 10 years ago Andrew Sutherland [:asuth] (he/him) 54.19 KB, text/plain		Details
flame-kk-trunk-blob-reproduced-full.txt 10 years ago Andrew Sutherland [:asuth] (he/him) 513.78 KB, text/plain		Details
stacktrace 10 years ago :gerard-majax 11.60 KB, text/plain		Details
ipc error stacktrace 10 years ago :gerard-majax 4.08 KB, text/plain		Details
MsgValueError stacktrace 10 years ago :gerard-majax 3.80 KB, text/plain		Details
stepping into AllocPBlobParent 10 years ago :gerard-majax 1.43 KB, text/plain		Details
GetSize()/CreateSlice() 10 years ago :gerard-majax 1.28 KB, text/plain		Details
ActorDestroy stracktrace when mActor is nulled 10 years ago :gerard-majax 4.26 KB, text/plain		Details
link to patch on github 10 years ago David Flanagan [:djf] 46 bytes, text/x-github-pull-request	asuth : review+ pdahiya : review+ bajaj : approval-gaia-v2.0+ bajaj : approval-gaia-v2.1+	Details \| Review
Verify_Woodduck_MMS.mp4 10 years ago Coler 6.61 MB, video/mp4		Details