Closed Bug 1048615 Opened 10 years ago Closed 10 years ago

[Flame] FX OS crash in nsJARChannel::OnStartRequest(nsIRequest, nsISupports) with Sketchbook Squad app

Tracking

()

Status:

VERIFIED FIXED

Milestone:

mozilla34

Project Flags:

blocking-b2g

2.1+

Tracking Flags:

Tracking

Status

b2g-v2.0

---

unaffected

b2g-v2.1

---

verified

People

(Reporter: marcia, Assigned: swu)

References

(
URL
)

Details

(Keywords: crash, regression, reproducible, Whiteboard: [b2g-crash])

Crash Data

Attachments

(4 files, 2 obsolete files)

Patch: Fix crash when simultaneously opening remote JAR file with mEnsureChildFd enabled. 10 years ago Shian-Yow Wu [:swu] 4.63 KB, patch		Details \| Diff \| Splinter Review
Patch: Test case for multiple simultaneous access. 10 years ago Shian-Yow Wu [:swu] 2.69 KB, patch	fabrice : review+	Details \| Diff \| Splinter Review
Patch(v2): Fix crash when simultaneously opening remote JAR file with mEnsureChildFd enabled. 10 years ago Shian-Yow Wu [:swu] 7.42 KB, patch	jduell.mcbugs : review+	Details \| Diff \| Splinter Review
Patch(v3): Fix crash when simultaneously opening remote JAR file with mEnsureChildFd enabled. r=jduell 10 years ago Shian-Yow Wu [:swu] 8.53 KB, patch	swu : review+	Details \| Diff \| Splinter Review
Verify_video.3gp 10 years ago Paladin 6.99 MB, video/3gpp		Details
logcat_flame_0137.txt 10 years ago Paladin 77.23 KB, text/plain		Details

Marcia Knous [:marcia]

Reporter

Description

•

10 years ago

This bug was filed from the Socorro interface and is 
report bp-57efad21-76df-46a5-a619-d9b8a2140804.
=============================================================

Flame, running:

Gaia   5fd14b8bc428f87f9b5cf9cc49f9a4f362a970fb
SourceStamp e6614d8d85f9
BuildID 20140804040204
Version 34.0a1
Base image: 123

STR:
1. Download Sketchbook Squad app from the Marketplace
2. Launch the app
3. Immediate crash.

Same thing happens with another app from the same developer, Chrono Cash. The crash doesn't happen using the latest 2.0 build on Flame - the app launches, but I don't seem to be able to actually play the game.

Marcia Knous [:marcia]

Reporter

Updated

•

10 years ago

URL: https://marketplace.firefox.com/app/s...

Jason Smith [:jsmith]

Comment 1

•

10 years ago

[Blocking Requested - why for this release]:

Reproducible crash regression on target apps on marketplace.

blocking-b2g: --- → 2.1?

Keywords: regression, regressionwindow-wanted, reproducible

Pi Wei Cheng [:piwei] (inactive)

Updated

•

10 years ago

QA Contact: pcheng

Pi Wei Cheng [:piwei] (inactive)

Comment 2

•

10 years ago

mozilla-inbound regression window:

Last Working Environmental Variables:
Device: Flame
BuildID: 20140717193829
Gaia: 7fbcfbacf286c2a8e41a3a96b6d82f1541880617
Gecko: daa82439f77f
Version: 33.0a1 (2.1 Master)
Firmware V123
User Agent Mozilla/5.0 (Mobile; rv:33.0) Gecko/33.0 Firefox/33.0

First Broken Environmental Variables:
Device: Flame
BuildID: 20140717194830
Gaia: 7fbcfbacf286c2a8e41a3a96b6d82f1541880617
Gecko: a5446bb5fa7d
Version: 33.0a1 (2.1 Master)
Firmware V123
User Agent Mozilla/5.0 (Mobile; rv:33.0) Gecko/33.0 Firefox/33.0

Gaia is the same.

Gecko pushlog:
http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=daa82439f77f&tochange=a5446bb5fa7d

Caused by bug 988816 ?

QA Whiteboard: [QAnalyst-Triage?]

status-b2g-v2.0: --- → unaffected

status-b2g-v2.1: --- → affected

Flags: needinfo?(jmitchell)

Keywords: regressionwindow-wanted

Joshua Mitchell (Inactive)

Comment 3

•

10 years ago

Caused by bug 988816 ?

Blocks: 988816

QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]

Flags: needinfo?(jmitchell) → needinfo?(swu)

Shian-Yow Wu [:swu]

Assignee

Comment 4

•

10 years ago

It is reproducible on my Flame device with latest mozilla-central, looking at it.

Assignee: nobody → swu

Shian-Yow Wu [:swu]

Assignee

Comment 5

•

10 years ago

Attached patch Patch: Fix crash when simultaneously opening remote JAR file with mEnsureChildFd enabled. (obsolete) — Details — Splinter Review

The crash happens if there are multiple remote open file requests for same JAR file triggered simultaneously when mEnsureChildFd enabled.

In bug 988816, when a different AsyncOpen() request triggered during a progressing AsyncOpen() for same JAR file, we let it do a new IPC, and expect two of them get file descriptor separately.  However, they belong to same TabChild, and use same JAR path in TabChild::GetCachedFileDescriptor(), so the info->mCallback in mCachedFileDescriptorInfos can store only one of them.  That will cause state error and may result in a OnStartRequest() after OnStopRequest(), which is the crash scenario of this bug.

The patch lets the later simultaneous requests wait until first request for same file completes, and obtain file descriptor from JAR cache at that time. This change also reduces unnecessory IPC.

Attachment #8472891 - Flags: review?(jduell.mcbugs)

Flags: needinfo?(swu)

Shian-Yow Wu [:swu]

Assignee

Comment 6

•

10 years ago

Attached patch Patch: Test case for multiple simultaneous access. — Details — Splinter Review

Fabrice, could you help to review on the test case?

Attachment #8473467 - Flags: review?(fabrice)

[:fabrice] Fabrice Desré

Updated

•

10 years ago

Attachment #8473467 - Flags: review?(fabrice) → review+

Jason Duell

Comment 7

•

10 years ago

Comment on attachment 8472891 [details] [diff] [review]
Patch: Fix crash when simultaneously opening remote JAR file with mEnsureChildFd enabled.

Review of attachment 8472891 [details] [diff] [review]:
-----------------------------------------------------------------

This code looks fine to me, except... I can't figure out if it will correctly handle the case where multiple JARChannels open the same remoteopenfile:// uri, AND the 1st one does NOT specify EnsureChildFd() but later ones do.  It looks to me like in that case the 1st request will complete but not store the fd in the JarCache, and then the 2nd, 3rd, etc JarChannels will neither have the fd from IPDL, nor get it from the JarCache.  But perhaps I'm missing something?  In any case I'd feel better if we had a test for that use case (one can be added in an xpcshell test fairly easily I suspect).

::: modules/libjar/nsJARChannel.cpp
@@ +1061,5 @@
> +                if (!jarCache) {
> +                    rv = NS_ERROR_FAILURE;
> +                }
> +                PRFileDesc *jar_fd = nullptr;
> +                jarCache->GetFd(mJarFile, &jar_fd);

shouldn't you test for success here? What if the jarCache doesn't have the fd, as I suspect can happen?

Jason Duell

Comment 8

•

10 years ago

^^^ see previous comment

Flags: needinfo?(swu)

Jason Duell

Comment 9

•

10 years ago

Note: if I'm wrong and this code does handle the case I mention in comment 7, feel free to mark +r and land.  If it needs a fix flag me for review again.

Shian-Yow Wu [:swu]

Assignee

Comment 10

•

10 years ago

(In reply to Jason Duell (:jduell) from comment #7)
> Comment on attachment 8472891 [details] [diff] [review]
> Patch: Fix crash when simultaneously opening remote JAR file with
> mEnsureChildFd enabled.
> 
> Review of attachment 8472891 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> This code looks fine to me, except... I can't figure out if it will
> correctly handle the case where multiple JARChannels open the same
> remoteopenfile:// uri, AND the 1st one does NOT specify EnsureChildFd() but
> later ones do.  It looks to me like in that case the 1st request will
> complete but not store the fd in the JarCache, and then the 2nd, 3rd, etc
> JarChannels will neither have the fd from IPDL, nor get it from the
> JarCache.  But perhaps I'm missing something?  In any case I'd feel better
> if we had a test for that use case (one can be added in an xpcshell test
> fairly easily I suspect).

That's for catching this, that's a good point.  In such case, the latter JARChannels will fail to get fd.

Currently I have two ideas in mind for this case:
1) For JARChannel request with EnsureChildFd(), pass different path name as index to TabChild::GetCachedFileDescriptor().  For example, if the original path name is "/path/application.zip", we use "/path/application.zip_1" as index for EnsureChildFd() case.  This breaks the meaning of path a little, but should be reasonable if the purpose of path argument is just for indexing.  

2) Once JARChannel decided to try remote opening, set jarCache->SetMustCacheFd() with true before checking the RemoteOpenFileInProgress().  This should make the fd be stored in JarCache for the 1st request, but it may need more handling to avoid possible racing.

            if (mEnsureChildFd && jarCache) {
                jarCache->SetMustCacheFd(remoteFile, true);
            }    

            if (gJarHandler->RemoteOpenFileInProgress(remoteFile, this)) {
                // JarHandler will trigger OnRemoteFileOpen() after the first
                // request for this file completes and we'll get a JAR cache
                // hit.
                return NS_OK;
            }    

Do you have any suggestions?

> 
> ::: modules/libjar/nsJARChannel.cpp
> @@ +1061,5 @@
> > +                if (!jarCache) {
> > +                    rv = NS_ERROR_FAILURE;
> > +                }
> > +                PRFileDesc *jar_fd = nullptr;
> > +                jarCache->GetFd(mJarFile, &jar_fd);
> 
> shouldn't you test for success here? What if the jarCache doesn't have the
> fd, as I suspect can happen?

If GetFd() fails to get fd, rv returned from the next line will also fail.
  rv = SetRemoteNSPRFileDesc(jar_fd);

Then it resulted a NotifyError(rv) in same function.

Flags: needinfo?(swu) → needinfo?(jduell.mcbugs)

Shian-Yow Wu [:swu]

Assignee

Comment 11

•

10 years ago

(In reply to Shian-Yow Wu [:swu] from comment #10)
> That's for catching this, that's a good point.  In such case, the latter

s/That's/Thanks/

Shian-Yow Wu [:swu]

Assignee

Comment 12

•

10 years ago

Attached patch Patch(v2): Fix crash when simultaneously opening remote JAR file with mEnsureChildFd enabled. (obsolete) — Details — Splinter Review

Changes in this patch:
1. Added xpcshell test for comment 7 case.
2. Fixed comment 7 case by approach 2 of comment 10.

Attachment #8472891 - Attachment is obsolete: true

Attachment #8472891 - Flags: review?(jduell.mcbugs)

Attachment #8475187 - Flags: review?(jduell.mcbugs)

Jason Duell

Comment 13

•

10 years ago

Comment on attachment 8475187 [details] [diff] [review]
Patch(v2): Fix crash when simultaneously opening remote JAR file with mEnsureChildFd enabled.

Review of attachment 8475187 [details] [diff] [review]:
-----------------------------------------------------------------

::: modules/libjar/test/unit/test_jarchannel.js
@@ +228,5 @@
> +        chan1.asyncOpen(new Listener(function(l) {
> +        }), null);
> +
> +        // Open channel 2 with ensureChildFd()
> +        var chan2 = ios.newChannel(jarBase + "/inner2.zip", null, null)

why ask for inner1, then inner2?  Aren't you best off asking for the exact same URI?  (IIUC it shouldn't make a difference because these are inner zips, and we'll wind up remoteOpenFile-ing the same test_bug637286.zip, right?)

Attachment #8475187 - Flags: review?(jduell.mcbugs) → review+

Jason Duell

Comment 14

•

10 years ago

> 2) Once JARChannel decided to try remote opening, set jarCache->SetMustCacheFd() with true 
> before checking the RemoteOpenFileInProgress().  This should make the fd be stored in 
> JarCache for the 1st request, but it may need more handling to avoid possible racing.

I think that there's hopefully no race: if a 2nd JarChannel with the same URI comes along and there's already one pending, we'll call the JarCache>SetMustCacheFd synchronously, so it'll be set whenever the 1st request finally finishes and stores the entry in the Jar cache.   That was my understanding, at least, but I didn't look at it too rigorously.

Flags: needinfo?(jduell.mcbugs)

Jason Smith [:jsmith]

Updated

•

10 years ago

QA Whiteboard: [QAnalyst-Triage+] → [QAnalyst-Triage+][lead-review+]

Jason Duell

Comment 15

•

10 years ago

Comment on attachment 8475187 [details] [diff] [review]
Patch(v2): Fix crash when simultaneously opening remote JAR file with mEnsureChildFd enabled.

Removing r+ pending discussion in bug 988816 comment 47 - 48.  We need to figure out if it's safe to share file offsets (which is what we'll do if we cache the fd like this).  It's definitely not safe in the general case--i.e. the semantics are very different than regular nsIFile.Open().  But perhaps we can get away with it for the use cases.

If that doesn't pan out we'll need to do a remote IPDL open for each fd requested.

Attachment #8475187 - Flags: review+

Jason Duell

Comment 16

•

10 years ago

One other possible idea here:  since we know the only file we're remoteOpen-ing is application.zip, for performance we could implement some kind of cache of separately open()'d fd's.  I.e we could specify how many fd's we want to cache on the child, and whenever one is handed out, we could IPDL back to the parent to get a new one, without actually suffering the latency of waiting for the IPDL and the parent-side I/O.

But that's just an optimization idea--we might be fine with something that just does IPDL for each fd, but fixes the issue where TabChild only supports a single pending request at a time.

Shian-Yow Wu [:swu]

Assignee

Comment 17

•

10 years ago

(In reply to Jason Duell (:jduell) from comment #15)
> Comment on attachment 8475187 [details] [diff] [review]
> Patch(v2): Fix crash when simultaneously opening remote JAR file with
> mEnsureChildFd enabled.
> 
> Removing r+ pending discussion in bug 988816 comment 47 - 48.  We need to
> figure out if it's safe to share file offsets (which is what we'll do if we
> cache the fd like this).  It's definitely not safe in the general case--i.e.
> the semantics are very different than regular nsIFile.Open().  But perhaps
> we can get away with it for the use cases.

So far, the fd from JAR cache is only for the purpose of mmap, which is safe.  If there is no requirement to use the cached fd on other purpose than mmap in the future, it seems worth to use cached fd on limited use case to win performance over multiple IPDL open() requests.

(In reply to Jason Duell (:jduell) from comment #16)
> One other possible idea here:  since we know the only file we're
> remoteOpen-ing is application.zip, for performance we could implement some
> kind of cache of separately open()'d fd's.  I.e we could specify how many
> fd's we want to cache on the child, and whenever one is handed out, we could
> IPDL back to the parent to get a new one, without actually suffering the
> latency of waiting for the IPDL and the parent-side I/O.

We don't know the potential number of burst remote open requests in advance, if cache buffer underrun happens, the remaining requests will be pended.  Therefore, if we decided to support separate fd openings, enhancing TabChild to support multiple pending requests should be the first thing to do.

> 
> But that's just an optimization idea--we might be fine with something that
> just does IPDL for each fd, but fixes the issue where TabChild only supports
> a single pending request at a time.

Jed Davis [:jld] ⟨⏰|UTC-7⟩ ⟦he/him⟧

Comment 18

•

10 years ago

(In reply to Jason Duell (:jduell) from comment #16)
> One other possible idea here:  since we know the only file we're
> remoteOpen-ing is application.zip

Bug 1034143 is basically the same problem, but for temporary files used to implement jar:http(s):// URLs.  We seem to consider our IPC framework to be too inefficient to use it in the critical path here; would that also apply to those files?

Jason Duell

Comment 19

•

10 years ago

Comment on attachment 8475187 [details] [diff] [review]
Patch(v2): Fix crash when simultaneously opening remote JAR file with mEnsureChildFd enabled.

Review of attachment 8475187 [details] [diff] [review]:
-----------------------------------------------------------------

Yes, I need to think about how we'll handle bug 103143 for that reason.

For now, since this code apparently works despite the dup(), I'm going to +r and we'll document the limitation in bug 1055966

Attachment #8475187 - Flags: review+

Shian-Yow Wu [:swu]

Assignee

Comment 20

•

10 years ago

Attached patch Patch(v3): Fix crash when simultaneously opening remote JAR file with mEnsureChildFd enabled. r=jduell — Details — Splinter Review

Carrying r+ from jduell and:
1) Addressed comment 13
2) Fixed try server failure on Windows platform

Attachment #8475187 - Attachment is obsolete: true

Attachment #8476590 - Flags: review+

Shian-Yow Wu [:swu]

Assignee

Comment 21

•

10 years ago

remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/f3c7ef477d3d
remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/b7529dcda6eb

Try server result:
https://tbpl.mozilla.org/?tree=Try&rev=e70ceba8b544

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 22

•

10 years ago

https://hg.mozilla.org/mozilla-central/rev/f3c7ef477d3d
https://hg.mozilla.org/mozilla-central/rev/b7529dcda6eb

Status: NEW → RESOLVED

Closed: 10 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla34

bhavana bajaj [:bajaj]

Comment 23

•

10 years ago

Setting this as fixed on 2.1 given it landed before the merge.

blocking-b2g: 2.1? → 2.1+

status-b2g-v2.1: affected → fixed

Paladin

Comment 24

•

10 years ago

Attached video Verify_video.3gp — Details

This bug has been verified to fail on Flame 2.1
STR:
1. Download Sketchbook Squad app from the Marketplace
2. Launch the app
**The app launches, but I don't seem to be able to actually play the game.

See attachment: Verify_video.3gp and logcat_flame_0137.txt
Reproducing rate: 5/5
Flame2.1 build:
Gaia-Rev        38e17b0219cbc50a4ad6f51101898f89e513a552
Gecko-Rev       https://hg.mozilla.org/releases/mozilla-b2g34_v2_1/rev/8b92c4b8f59a
Build-ID        20141205001201
Version         34.0

Flags: needinfo?(mlien)

Paladin

Comment 25

•

10 years ago

Attached file logcat_flame_0137.txt — Details

Mike Lien[:mlien]

Comment 26

•

10 years ago

Refer to comment 0 and comment 24, crash problem doesn't happen anymore
Mark as VERIFIED FIXED

Status: RESOLVED → VERIFIED

status-b2g-v2.1: fixed → verified

Flags: needinfo?(mlien)

You need to log in before you can comment on or make changes to this bug.

[Flame] FX OS crash in nsJARChannel::OnStartRequest(nsIRequest*, nsISupports*) with Sketchbook Squad app

[Flame] FX OS crash in nsJARChannel::OnStartRequest(nsIRequest, nsISupports) with Sketchbook Squad app