Closed
Bug 1048615
Opened 10 years ago
Closed 10 years ago
[Flame] FX OS crash in nsJARChannel::OnStartRequest(nsIRequest*, nsISupports*) with Sketchbook Squad app
Categories
(Core :: Networking, defect)
Tracking
()
Tracking | Status | |
---|---|---|
b2g-v2.0 | --- | unaffected |
b2g-v2.1 | --- | verified |
People
(Reporter: marcia, Assigned: swu)
References
()
Details
(Keywords: crash, regression, reproducible, Whiteboard: [b2g-crash])
Crash Data
Attachments
(4 files, 2 obsolete files)
2.69 KB,
patch
|
fabrice
:
review+
|
Details | Diff | Splinter Review |
8.53 KB,
patch
|
swu
:
review+
|
Details | Diff | Splinter Review |
6.99 MB,
video/3gpp
|
Details | |
77.23 KB,
text/plain
|
Details |
This bug was filed from the Socorro interface and is report bp-57efad21-76df-46a5-a619-d9b8a2140804. ============================================================= Flame, running: Gaia 5fd14b8bc428f87f9b5cf9cc49f9a4f362a970fb SourceStamp e6614d8d85f9 BuildID 20140804040204 Version 34.0a1 Base image: 123 STR: 1. Download Sketchbook Squad app from the Marketplace 2. Launch the app 3. Immediate crash. Same thing happens with another app from the same developer, Chrono Cash. The crash doesn't happen using the latest 2.0 build on Flame - the app launches, but I don't seem to be able to actually play the game.
Reporter | ||
Updated•10 years ago
|
Comment 1•10 years ago
|
||
[Blocking Requested - why for this release]: Reproducible crash regression on target apps on marketplace.
blocking-b2g: --- → 2.1?
Updated•10 years ago
|
QA Contact: pcheng
Comment 2•10 years ago
|
||
mozilla-inbound regression window: Last Working Environmental Variables: Device: Flame BuildID: 20140717193829 Gaia: 7fbcfbacf286c2a8e41a3a96b6d82f1541880617 Gecko: daa82439f77f Version: 33.0a1 (2.1 Master) Firmware V123 User Agent Mozilla/5.0 (Mobile; rv:33.0) Gecko/33.0 Firefox/33.0 First Broken Environmental Variables: Device: Flame BuildID: 20140717194830 Gaia: 7fbcfbacf286c2a8e41a3a96b6d82f1541880617 Gecko: a5446bb5fa7d Version: 33.0a1 (2.1 Master) Firmware V123 User Agent Mozilla/5.0 (Mobile; rv:33.0) Gecko/33.0 Firefox/33.0 Gaia is the same. Gecko pushlog: http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=daa82439f77f&tochange=a5446bb5fa7d Caused by bug 988816 ?
QA Whiteboard: [QAnalyst-Triage?]
status-b2g-v2.0:
--- → unaffected
status-b2g-v2.1:
--- → affected
Flags: needinfo?(jmitchell)
Keywords: regressionwindow-wanted
Comment 3•10 years ago
|
||
Caused by bug 988816 ?
Blocks: 988816
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(jmitchell) → needinfo?(swu)
Assignee | ||
Comment 4•10 years ago
|
||
It is reproducible on my Flame device with latest mozilla-central, looking at it.
Assignee: nobody → swu
Assignee | ||
Comment 5•10 years ago
|
||
The crash happens if there are multiple remote open file requests for same JAR file triggered simultaneously when mEnsureChildFd enabled. In bug 988816, when a different AsyncOpen() request triggered during a progressing AsyncOpen() for same JAR file, we let it do a new IPC, and expect two of them get file descriptor separately. However, they belong to same TabChild, and use same JAR path in TabChild::GetCachedFileDescriptor(), so the info->mCallback in mCachedFileDescriptorInfos can store only one of them. That will cause state error and may result in a OnStartRequest() after OnStopRequest(), which is the crash scenario of this bug. The patch lets the later simultaneous requests wait until first request for same file completes, and obtain file descriptor from JAR cache at that time. This change also reduces unnecessory IPC.
Attachment #8472891 -
Flags: review?(jduell.mcbugs)
Flags: needinfo?(swu)
Assignee | ||
Comment 6•10 years ago
|
||
Fabrice, could you help to review on the test case?
Attachment #8473467 -
Flags: review?(fabrice)
Updated•10 years ago
|
Attachment #8473467 -
Flags: review?(fabrice) → review+
Comment 7•10 years ago
|
||
Comment on attachment 8472891 [details] [diff] [review] Patch: Fix crash when simultaneously opening remote JAR file with mEnsureChildFd enabled. Review of attachment 8472891 [details] [diff] [review]: ----------------------------------------------------------------- This code looks fine to me, except... I can't figure out if it will correctly handle the case where multiple JARChannels open the same remoteopenfile:// uri, AND the 1st one does NOT specify EnsureChildFd() but later ones do. It looks to me like in that case the 1st request will complete but not store the fd in the JarCache, and then the 2nd, 3rd, etc JarChannels will neither have the fd from IPDL, nor get it from the JarCache. But perhaps I'm missing something? In any case I'd feel better if we had a test for that use case (one can be added in an xpcshell test fairly easily I suspect). ::: modules/libjar/nsJARChannel.cpp @@ +1061,5 @@ > + if (!jarCache) { > + rv = NS_ERROR_FAILURE; > + } > + PRFileDesc *jar_fd = nullptr; > + jarCache->GetFd(mJarFile, &jar_fd); shouldn't you test for success here? What if the jarCache doesn't have the fd, as I suspect can happen?
Comment 9•10 years ago
|
||
Note: if I'm wrong and this code does handle the case I mention in comment 7, feel free to mark +r and land. If it needs a fix flag me for review again.
Assignee | ||
Comment 10•10 years ago
|
||
(In reply to Jason Duell (:jduell) from comment #7) > Comment on attachment 8472891 [details] [diff] [review] > Patch: Fix crash when simultaneously opening remote JAR file with > mEnsureChildFd enabled. > > Review of attachment 8472891 [details] [diff] [review]: > ----------------------------------------------------------------- > > This code looks fine to me, except... I can't figure out if it will > correctly handle the case where multiple JARChannels open the same > remoteopenfile:// uri, AND the 1st one does NOT specify EnsureChildFd() but > later ones do. It looks to me like in that case the 1st request will > complete but not store the fd in the JarCache, and then the 2nd, 3rd, etc > JarChannels will neither have the fd from IPDL, nor get it from the > JarCache. But perhaps I'm missing something? In any case I'd feel better > if we had a test for that use case (one can be added in an xpcshell test > fairly easily I suspect). That's for catching this, that's a good point. In such case, the latter JARChannels will fail to get fd. Currently I have two ideas in mind for this case: 1) For JARChannel request with EnsureChildFd(), pass different path name as index to TabChild::GetCachedFileDescriptor(). For example, if the original path name is "/path/application.zip", we use "/path/application.zip_1" as index for EnsureChildFd() case. This breaks the meaning of path a little, but should be reasonable if the purpose of path argument is just for indexing. 2) Once JARChannel decided to try remote opening, set jarCache->SetMustCacheFd() with true before checking the RemoteOpenFileInProgress(). This should make the fd be stored in JarCache for the 1st request, but it may need more handling to avoid possible racing. if (mEnsureChildFd && jarCache) { jarCache->SetMustCacheFd(remoteFile, true); } if (gJarHandler->RemoteOpenFileInProgress(remoteFile, this)) { // JarHandler will trigger OnRemoteFileOpen() after the first // request for this file completes and we'll get a JAR cache // hit. return NS_OK; } Do you have any suggestions? > > ::: modules/libjar/nsJARChannel.cpp > @@ +1061,5 @@ > > + if (!jarCache) { > > + rv = NS_ERROR_FAILURE; > > + } > > + PRFileDesc *jar_fd = nullptr; > > + jarCache->GetFd(mJarFile, &jar_fd); > > shouldn't you test for success here? What if the jarCache doesn't have the > fd, as I suspect can happen? If GetFd() fails to get fd, rv returned from the next line will also fail. rv = SetRemoteNSPRFileDesc(jar_fd); Then it resulted a NotifyError(rv) in same function.
Flags: needinfo?(swu) → needinfo?(jduell.mcbugs)
Assignee | ||
Comment 11•10 years ago
|
||
(In reply to Shian-Yow Wu [:swu] from comment #10) > That's for catching this, that's a good point. In such case, the latter s/That's/Thanks/
Assignee | ||
Comment 12•10 years ago
|
||
Changes in this patch: 1. Added xpcshell test for comment 7 case. 2. Fixed comment 7 case by approach 2 of comment 10.
Attachment #8472891 -
Attachment is obsolete: true
Attachment #8472891 -
Flags: review?(jduell.mcbugs)
Attachment #8475187 -
Flags: review?(jduell.mcbugs)
Comment 13•10 years ago
|
||
Comment on attachment 8475187 [details] [diff] [review] Patch(v2): Fix crash when simultaneously opening remote JAR file with mEnsureChildFd enabled. Review of attachment 8475187 [details] [diff] [review]: ----------------------------------------------------------------- ::: modules/libjar/test/unit/test_jarchannel.js @@ +228,5 @@ > + chan1.asyncOpen(new Listener(function(l) { > + }), null); > + > + // Open channel 2 with ensureChildFd() > + var chan2 = ios.newChannel(jarBase + "/inner2.zip", null, null) why ask for inner1, then inner2? Aren't you best off asking for the exact same URI? (IIUC it shouldn't make a difference because these are inner zips, and we'll wind up remoteOpenFile-ing the same test_bug637286.zip, right?)
Attachment #8475187 -
Flags: review?(jduell.mcbugs) → review+
Comment 14•10 years ago
|
||
> 2) Once JARChannel decided to try remote opening, set jarCache->SetMustCacheFd() with true
> before checking the RemoteOpenFileInProgress(). This should make the fd be stored in
> JarCache for the 1st request, but it may need more handling to avoid possible racing.
I think that there's hopefully no race: if a 2nd JarChannel with the same URI comes along and there's already one pending, we'll call the JarCache>SetMustCacheFd synchronously, so it'll be set whenever the 1st request finally finishes and stores the entry in the Jar cache. That was my understanding, at least, but I didn't look at it too rigorously.
Flags: needinfo?(jduell.mcbugs)
Updated•10 years ago
|
QA Whiteboard: [QAnalyst-Triage+] → [QAnalyst-Triage+][lead-review+]
Comment 15•10 years ago
|
||
Comment on attachment 8475187 [details] [diff] [review] Patch(v2): Fix crash when simultaneously opening remote JAR file with mEnsureChildFd enabled. Removing r+ pending discussion in bug 988816 comment 47 - 48. We need to figure out if it's safe to share file offsets (which is what we'll do if we cache the fd like this). It's definitely not safe in the general case--i.e. the semantics are very different than regular nsIFile.Open(). But perhaps we can get away with it for the use cases. If that doesn't pan out we'll need to do a remote IPDL open for each fd requested.
Attachment #8475187 -
Flags: review+
Comment 16•10 years ago
|
||
One other possible idea here: since we know the only file we're remoteOpen-ing is application.zip, for performance we could implement some kind of cache of separately open()'d fd's. I.e we could specify how many fd's we want to cache on the child, and whenever one is handed out, we could IPDL back to the parent to get a new one, without actually suffering the latency of waiting for the IPDL and the parent-side I/O. But that's just an optimization idea--we might be fine with something that just does IPDL for each fd, but fixes the issue where TabChild only supports a single pending request at a time.
Assignee | ||
Comment 17•10 years ago
|
||
(In reply to Jason Duell (:jduell) from comment #15) > Comment on attachment 8475187 [details] [diff] [review] > Patch(v2): Fix crash when simultaneously opening remote JAR file with > mEnsureChildFd enabled. > > Removing r+ pending discussion in bug 988816 comment 47 - 48. We need to > figure out if it's safe to share file offsets (which is what we'll do if we > cache the fd like this). It's definitely not safe in the general case--i.e. > the semantics are very different than regular nsIFile.Open(). But perhaps > we can get away with it for the use cases. So far, the fd from JAR cache is only for the purpose of mmap, which is safe. If there is no requirement to use the cached fd on other purpose than mmap in the future, it seems worth to use cached fd on limited use case to win performance over multiple IPDL open() requests. (In reply to Jason Duell (:jduell) from comment #16) > One other possible idea here: since we know the only file we're > remoteOpen-ing is application.zip, for performance we could implement some > kind of cache of separately open()'d fd's. I.e we could specify how many > fd's we want to cache on the child, and whenever one is handed out, we could > IPDL back to the parent to get a new one, without actually suffering the > latency of waiting for the IPDL and the parent-side I/O. We don't know the potential number of burst remote open requests in advance, if cache buffer underrun happens, the remaining requests will be pended. Therefore, if we decided to support separate fd openings, enhancing TabChild to support multiple pending requests should be the first thing to do. > > But that's just an optimization idea--we might be fine with something that > just does IPDL for each fd, but fixes the issue where TabChild only supports > a single pending request at a time.
Comment 18•10 years ago
|
||
(In reply to Jason Duell (:jduell) from comment #16) > One other possible idea here: since we know the only file we're > remoteOpen-ing is application.zip Bug 1034143 is basically the same problem, but for temporary files used to implement jar:http(s):// URLs. We seem to consider our IPC framework to be too inefficient to use it in the critical path here; would that also apply to those files?
Comment 19•10 years ago
|
||
Comment on attachment 8475187 [details] [diff] [review] Patch(v2): Fix crash when simultaneously opening remote JAR file with mEnsureChildFd enabled. Review of attachment 8475187 [details] [diff] [review]: ----------------------------------------------------------------- Yes, I need to think about how we'll handle bug 103143 for that reason. For now, since this code apparently works despite the dup(), I'm going to +r and we'll document the limitation in bug 1055966
Attachment #8475187 -
Flags: review+
Assignee | ||
Comment 20•10 years ago
|
||
Carrying r+ from jduell and: 1) Addressed comment 13 2) Fixed try server failure on Windows platform
Attachment #8475187 -
Attachment is obsolete: true
Attachment #8476590 -
Flags: review+
Assignee | ||
Comment 21•10 years ago
|
||
remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/f3c7ef477d3d remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/b7529dcda6eb Try server result: https://tbpl.mozilla.org/?tree=Try&rev=e70ceba8b544
https://hg.mozilla.org/mozilla-central/rev/f3c7ef477d3d https://hg.mozilla.org/mozilla-central/rev/b7529dcda6eb
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla34
Comment 23•10 years ago
|
||
Setting this as fixed on 2.1 given it landed before the merge.
blocking-b2g: 2.1? → 2.1+
Comment 24•10 years ago
|
||
This bug has been verified to fail on Flame 2.1 STR: 1. Download Sketchbook Squad app from the Marketplace 2. Launch the app **The app launches, but I don't seem to be able to actually play the game. See attachment: Verify_video.3gp and logcat_flame_0137.txt Reproducing rate: 5/5 Flame2.1 build: Gaia-Rev 38e17b0219cbc50a4ad6f51101898f89e513a552 Gecko-Rev https://hg.mozilla.org/releases/mozilla-b2g34_v2_1/rev/8b92c4b8f59a Build-ID 20141205001201 Version 34.0
Flags: needinfo?(mlien)
Comment 25•10 years ago
|
||
Comment 26•10 years ago
|
||
Refer to comment 0 and comment 24, crash problem doesn't happen anymore Mark as VERIFIED FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•